Publication:
A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications

dc.citedby7
dc.contributor.authorUddin I.en_US
dc.contributor.authorAwan H.H.en_US
dc.contributor.authorKhalid M.en_US
dc.contributor.authorKhan S.en_US
dc.contributor.authorAkbar S.en_US
dc.contributor.authorSarker M.R.en_US
dc.contributor.authorAbdolrasol M.G.M.en_US
dc.contributor.authorAlghamdi T.A.H.en_US
dc.contributor.authorid58993722900en_US
dc.contributor.authorid57298070200en_US
dc.contributor.authorid57192190458en_US
dc.contributor.authorid57204809479en_US
dc.contributor.authorid57194609918en_US
dc.contributor.authorid37122644300en_US
dc.contributor.authorid35796848700en_US
dc.contributor.authorid57456914500en_US
dc.date.accessioned2025-03-03T07:41:35Z
dc.date.available2025-03-03T07:41:35Z
dc.date.issued2024
dc.description.abstractRNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA?s operations and character. The modification process by the TET enzyme oxidation is the crucial change associated with cytosine hydroxymethylation. The effect of CR is an alteration in specific biochemical ways of the organism, such as gene expression and epigenetic alterations. Traditional laboratory systems that identify 5-hydroxymethylcytosine (5hmC) samples are expensive and time-consuming compared to other methods. To address this challenge, the paper proposed XGB5hmC, a machine learning algorithm based on a robust gradient boosting algorithm (XGBoost), with different residue based formulation methods to identify 5hmC samples. Their results were amalgamated, and six different frequency residue based encoding features were fused to form a hybrid vector in order to enhance model discrimination capabilities. In addition, the proposed model incorporates SHAP (Shapley Additive Explanations) based feature selection to demonstrate model interpretability by highlighting the high contributory features. Among the applied machine learning algorithms, the XGBoost ensemble model using the tenfold cross-validation test achieved improved results than existing state-of-the-art models. Our model reported an accuracy of 89.97%, sensitivity of 87.78%, specificity of 94.45%, F1-score of 0.8934%, and MCC of 0.8764%. This study highlights the potential to provide valuable insights for enhancing medical assessment and treatment protocols, representing a significant advancement in RNA modification analysis. ? The Author(s) 2024.en_US
dc.description.natureFinalen_US
dc.identifier.ArtNo20819
dc.identifier.doi10.1038/s41598-024-71568-z
dc.identifier.issue1
dc.identifier.scopus2-s2.0-85203292932
dc.identifier.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85203292932&doi=10.1038%2fs41598-024-71568-z&partnerID=40&md5=e986cd50a63803de551f3bd63ca071e8
dc.identifier.urihttps://irepository.uniten.edu.my/handle/123456789/36210
dc.identifier.volume14
dc.publisherNature Researchen_US
dc.relation.ispartofAll Open Access; Gold Open Access
dc.sourceScopus
dc.sourcetitleScientific Reports
dc.subject5-Methylcytosine
dc.subjectAlgorithms
dc.subjectCytosine
dc.subjectHumans
dc.subjectMachine Learning
dc.subject5 methylcytosine
dc.subject5-hydroxymethylcytosine
dc.subjectcytosine
dc.subjectalgorithm
dc.subjecthuman
dc.subjectmachine learning
dc.subjectmetabolism
dc.titleA hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modificationsen_US
dc.typeArticleen_US
dspace.entity.typePublication
Files
Collections