June 2024 ‑ September 2024
Chang Liu, Xiangyang Wang, Chun Yu*, Yingtian Shi, Chongyang Wang, Ziqi Liu, Chen Liang, Yuanchun Shi
Limited accuracy of eye tracking on smartphones restricts its use. Existing RGB-camera-based eye tracking rely on extensive datasets, which could be enhanced by continuous fine-tuning using calibration data implicitly collected from interaction. In this context, we propose COMETIC (Cursor Operation Mediated Eye-Tracking Implicit Calibration), which introduces a cursor-based interaction and utilizes the inherent correlation between cursor and eye movement. By filtering valid cursor coordinates as proxies for the ground truth of gaze and fine-tuning the eye-tracking model with corresponding images, COMETIC enhances accuracy during interaction. Both filtering and fine-tuning use pre-trained models and could be facilitated using personalized, dynamically updated data. Results show COMETIC achieves an average eye tracking error of 208.04 px (1.2 cm), representing a 49.64% improvement compared to that without fine-tuning. We found that filtering cursor points whose actual distance to gaze falls within 250 to 300 px (1.44 to 1.73 cm) yields the best eye tracking results.
The Workflow of COMETIC (Cursor Operation Mediated Eye-Tracking Implicit Calibration). The system can be divided into interaction phase and eye tracking & calibration phase. (A) When the user activates the cursor, it appears at the estimated gaze location. Due to the estimation error, the user refines the cursor position by sliding their thumb. Upon releasing the thumb, a click is executed at the cursor’s current position. (B) Model 1 is used for eye tracking, taking the image sequence as input and outputting the gaze location (cursor position at activation). Model 2 assists in fine-tuning Model 1 by selecting cursor coordinates from the refinement process that can serve as proxies for gaze location. These selected data points are then paired with images and used to fine-tune Model 1. In conclusion, our system leverage data implicitly collected during user interactions to improve eye-tracking accuracy and enhance the interaction experience.
Structure of COMETIC Models. We use Model 1 to convert Video and Video.info into gaze positions. We use Model 2 to select cursor positions as proxies for the ground truth of gaze position, and use the video-proxy pairs to fine-tune Model 1 for better eye tracking performance.
Publication:
Submitting to CHI 2025, passing first round review.