Academic Journal
Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images
| Title: | Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images |
|---|---|
| Authors: | Maquiling, Virmarie, Byrne, Sean Anthony, Niehorster, Diederick C., Carminati, Marco, Kasneci, Enkelejda |
| Contributors: | Lund University, Joint Faculties of Humanities and Theology, Units, Lund University Humanities Lab, Lunds universitet, Humanistiska och teologiska fakulteterna, Fakultetsgemensamma verksamheter, Humanistlaboratoriet, Originator, Lund University, Faculty of Social Sciences, Departments of Administrative, Economic and Social Sciences, Department of Psychology, Lunds universitet, Samhällsvetenskapliga fakulteten, Samhällsvetenskapliga institutioner och centrumbildningar, Institutionen för psykologi, Originator |
| Source: | Proceedings of the ACM on Computer Graphics and Interactive Techniques. 8(2):1-16 |
| Subject Terms: | Natural Sciences, Computer and Information Sciences, Naturvetenskap, Data- och informationsvetenskap (Datateknik), Natural Language Processing, Språkbehandling och datorlingvistik |
| Description: | We explore the transformative potential of SAM 2, a vision foundation model, in advancing gaze estimation. SAM 2 addresses key challenges in gaze estimation by significantly reducing annotation time, simplifying deployment, and enhancing segmentation accuracy. Utilizing its zero-shot capabilities with minimal user input—a single click per video—we tested SAM 2 on over 14 million eye images from a diverse range of datasets, including the EDS challenge datasets and Labelled Pupils in the Wild. This is the first application of SAM 2 to the gaze estimation domain. Remarkably, SAM 2 matches the performance of domain-specific models in pupil segmentation, achieving competitive mIOU scores of up to 93% without fine-tuning. We argue that SAM 2 achieves the sought-after standard of domain generalization, with consistent mIOU scores (89.71%-93.74%) across diverse datasets, from virtual reality to "gaze-in-the-wild" scenarios. We provide our code and segmentation masks for these datasets to promote further research. |
| Access URL: | https://doi.org/10.1145/3729409 |
| Database: | SwePub |
Be the first to leave a comment!