Exploring topological data analysis for information extraction: application to recognition of Arabic machine-printed numerals

Λεπτομέρειες βιβλιογραφικής εγγραφής
Τίτλος: Exploring topological data analysis for information extraction: application to recognition of Arabic machine-printed numerals
Συγγραφείς: Djamel Bouchaffra, Fayçal Ykhlef
Πηγή: Journal of Engineering and Applied Science, Vol 71, Iss 1, Pp 1-27 (2024)
Στοιχεία εκδότη: Springer Science and Business Media LLC, 2024.
Έτος έκδοσης: 2024
Θεματικοί όροι: Data Analysis, Arabic machine-printed numeral recognition, Artificial intelligence, Information extraction, Topological data analysis, 02 engineering and technology, Pattern recognition (psychology), Topology, 01 natural sciences, 7. Clean energy, Statistical Topology, Arabic numerals, Betti number, Barcode filtration, Connected Component Labeling Algorithms, Chaincode representation, FOS: Mathematics, 0202 electrical engineering, electronic engineering, information engineering, 0101 mathematics, Topology (electrical circuits), Shape Analysis, Topological Methods, Invariant (physics), Discrete mathematics, 15. Life on land, Dynamic-time warping, Engineering (General). Civil engineering (General), Computer science, Numeral system, Algorithm, Topological Data Analysis in Science and Engineering, Computational Theory and Mathematics, Combinatorics, Mathematical physics, Computer Science, Physical Sciences, Computer Vision and Pattern Recognition, TA1-2040, Mathematics
Περιγραφή: This manuscript explores the capability of topological data analysis (TDA) based on homology theory (HT: a subfield of algebraic topology) to extract relevant information for recognition of confusing Arabic machine-printed numerals. In fact, topological properties may significantly reduce the confusion between some numerals such as “1” and “4” in the context of small data sets. These two latter digits differ in the sense that digit 1 has no hole and digit 4 has one hole. Our contribution consists of evaluating the contribution of TDA with its invariant descriptors such as Betti numbers in machine-printed Arabic numerals recognition. Our investigation is driven by the following set of actions: (i) we extract Betti numbers invariant features of each numeral image and partition the ten numerals into three different clusters with respect to these features. (ii) We then perform a classification by assigning a test image to its corresponding cluster, and map this image to a numeral using dynamic-time warping as a metric defined in the Freemans’ chaincode space. We compared our proposed approach with major state-of-the-art methods depicting various ways of using TDA in character recognition. The advantages and limitations of TDA (including its pros and cons) are discussed further based on numeral recognition results.
Τύπος εγγράφου: Article
Other literature type
Γλώσσα: English
ISSN: 2536-9512
1110-1903
DOI: 10.1186/s44147-023-00346-x
DOI: 10.60692/fjpdj-hpg05
DOI: 10.60692/arpgv-qk982
Σύνδεσμος πρόσβασης: https://doaj.org/article/4ab11352a1f1403bab2a71401224badb
Rights: CC BY
Αριθμός Καταχώρησης: edsair.doi.dedup.....a75f052e552b4362909fb45b21b68f6f
Βάση Δεδομένων: OpenAIRE
Περιγραφή
ISSN:25369512
11101903
DOI:10.1186/s44147-023-00346-x