Semi-supervised segmentation of RNA 3D structures using density-based clustering

Bibliographic Details
Title: Semi-supervised segmentation of RNA 3D structures using density-based clustering
Authors: Le, Quoc Khang, Angel, Eric, Tahi, Fariza, Postic, Guillaume
Contributors: Davesne, Frédéric
Source: Computational and Structural Biotechnology Journal. 27:3966-3984
Publisher Information: Elsevier BV, 2025.
Publication Year: 2025
Subject Terms: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], semi-supervised segmentation, RNA conformation, lncRNAs, 3D domains, [INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM]
Description: A growing body of evidence shows that the biological activity of RNA molecules is not only due to their primary and secondary structures, but also to their spatial conformation. This is analogous to proteins, where investigating function, folding, or evolution often requires dividing the three-dimensional (3D) structure into subparts that can be studied individually. These independent substructures, known as protein “3D domains”, are geometrically defined as compact and spatially separate regions of the polypeptide chain. In RNA macromolecules, however, and to the best of our knowledge, no equivalent 3D-based concept has yet been formulated. We present RNA3DClust, an application of the Mean Shift clustering algorithm to the RNA 3D structure partitioning problem. For this work, a dedicated post-clustering procedure was developed to address the peculiarities of delimiting 3D domains in RNA conformations. Tuning and benchmarking RNA3DClust required us to create reference datasets of RNA 3D domain annotations and to devise a new scoring function—the Chain Segment Distance (CSD)—for assessing segmentation quality. Importantly, we show that the domain decompositions produced by RNA3DClust are consistent with those based on RNA biological function and evolution. Finally, the emerging interest in long non-coding RNAs (lncRNAs) and their likeliness of containing folded regions has motivated us to generate an additional reference dataset of lncRNA predicted conformations. The resulting delineations of 3D domains by RNA3DClust illustrate the potential of our method for analyzing lncRNA 3D structures. Source code and datasets are freely available for download on the EvryRNA platform at: https://evryrna.ibisc.univ-evry.fr. GRAPHICAL ABSTRACT
Document Type: Article
Language: English
ISSN: 2001-0370
DOI: 10.1016/j.csbj.2025.08.037
DOI: 10.1101/2025.01.12.632579
Access URL: https://univ-evry.hal.science/hal-05250691v1
https://doi.org/10.1016/j.csbj.2025.08.037
Rights: CC BY
CC BY NC ND
Accession Number: edsair.doi.dedup.....86d62cfdb0b8d7bd28f05a01314d8eba
Database: OpenAIRE
Be the first to leave a comment!
You must be logged in first