Egocentric video description based on temporally-linked sequences

Λεπτομέρειες βιβλιογραφικής εγγραφής
Τίτλος:	Egocentric video description based on temporally-linked sequences
Συγγραφείς:	Bolaños, Marc, Peris-Abril, Álvaro, Casacuberta Nolla, Francisco, Soler, Sergi, Radeva, Petia
Συνεισφορές:	Universitat de Barcelona, Centro de Investigación Pattern Recognition and Human Language Technology, Generalitat Valenciana, Centres de Recerca de Catalunya, Ministerio de Economía y Competitividad, Institució Catalana de Recerca i Estudis Avançats, Agencia de Gestión de Ayudas Universitarias y de Investigación, Repositorio Institucional de la Universitat Politècnica de València Riunet
Πηγή:	Recercat. Dipósit de la Recerca de Catalunya instname Articles publicats en revistes (Matemàtiques i Informàtica) Dipòsit Digital de la UB Universidad de Barcelona RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia Universitat Politècnica de València (UPV)
Publication Status:	Preprint
Στοιχεία εκδότη:	Elsevier BV, 2018.
Έτος έκδοσης:	2018
Θεματικοί όροι:	FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Vídeo en l'ensenyament, Computer Science - Computer Vision and Pattern Recognition, Deep learning, 02 engineering and technology, Video description, Aprenentatge visual, Video tapes in education, 03 medical and health sciences, 0302 clinical medicine, Multi-modal learning, 0202 electrical engineering, electronic engineering, information engineering, Visual learning, Egocentric vision, LENGUAJES Y SISTEMAS INFORMATICOS
Περιγραφή:	Egocentric vision consists in acquiring images along the day from a first person point-of-view using wearable cameras. The automatic analysis of this information allows to discover daily patterns for improving the quality of life of the user. A natural topic that arises in egocentric vision is storytelling, that is, how to understand and tell the story relying behind the pictures. In this paper, we tackle storytelling as an egocentric sequences description problem. We propose a novel methodology that exploits information from temporally neighboring events, matching precisely the nature of egocentric sequences. Furthermore, we present a new method for multimodal data fusion consisting on a multi-input attention recurrent network. We also publish the first dataset for egocentric image sequences description, consisting of 1,339 events with 3,991 descriptions, from 55 days acquired by 11 people. Furthermore, we prove that our proposal outperforms classical attentional encoder-decoder methods for video description. 19 pages, 10 figures, 3 tables. Submitted to Journal of Visual Communication and Image Representation
Τύπος εγγράφου:	Article
Περιγραφή αρχείου:	application/pdf
Γλώσσα:	English
ISSN:	1047-3203
DOI:	10.1016/j.jvcir.2017.11.022
DOI:	10.48550/arxiv.1704.02163
DOI:	10.13039/501100003030
DOI:	10.13039/501100003329
DOI:	10.13039/501100003359
Σύνδεσμος πρόσβασης:	http://diposit.ub.edu/dspace/bitstream/2445/143165/1/684160.pdf http://arxiv.org/abs/1704.02163 http://hdl.handle.net/2445/143165 https://hdl.handle.net/2445/143165 https://riunet.upv.es/handle/10251/141941 http://diposit.ub.edu/dspace/bitstream/2445/143165/1/684160.pdf https://www.sciencedirect.com/science/article/pii/S1047320317302316 https://riunet.upv.es/handle/10251/141941 https://doi.org/10.1016/j.jvcir.2017.11.022 http://diposit.ub.edu/dspace/handle/2445/143165 https://arxiv.org/abs/1704.02163 https://hdl.handle.net/10251/141941 https://doi.org/10.1016/j.jvcir.2017.11.022
Rights:	Elsevier TDM CC BY NC ND arXiv Non-Exclusive Distribution
Αριθμός Καταχώρησης:	edsair.doi.dedup.....03b3fb36d5e093e56f2cec129a770d4a
Βάση Δεδομένων:	OpenAIRE

View record at OpenAIRE

View record from ScienceDirect

Περιγραφή
ISSN:	10473203
DOI:	10.1016/j.jvcir.2017.11.022