Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling

Khan I.; Ahmad A.R.; Jabeur N.; Mahdi M.N.

doi:10.1007/978-3-030-90235-3_38

Publication:
Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling

dc.citedby	1
dc.contributor.author	Khan I.	en_US
dc.contributor.author	Ahmad A.R.	en_US
dc.contributor.author	Jabeur N.	en_US
dc.contributor.author	Mahdi M.N.	en_US
dc.contributor.authorid	58061521900	en_US
dc.contributor.authorid	35589598800	en_US
dc.contributor.authorid	6505727698	en_US
dc.contributor.authorid	56727803900	en_US
dc.date.accessioned	2023-05-29T09:10:49Z
dc.date.available	2023-05-29T09:10:49Z
dc.date.issued	2021
dc.description	Classification (of information); Learning algorithms; Students; Class imbalance; Data level; Over sampling; Performance prediction; SMOTE; Spread subsampling; Student performance; Student performance prediction; Under-sampling; Machine learning	en_US
dc.description.abstract	Classification, a significant application of machine learning, labels each instance of the dataset into one of the predefined classes. Problems occur when the number of instances in the classes is not uniform. The exceptional lyuneven class distribution gives rise to class imbalancing issues which tend to demote the overall performance of the classifier. A set of data-level algorithms are available which are applied to adjust the class distribution. The class imbalancing emerges frequently in datasets from educational domains where the number of students with unsatisfactory performance general appears in low number comparing to the students with satisfactory outcomes. This paper applies a set of data-level sampling algorithms over a dataset taken from an educational domain. It underlines the consequences rising from classification with imbalanced dataset. This research confirms that a classification model achieving higher accuracy may not appear effective in correct identification of instances in minority class. Classification with an imbalance dataset may produce low recall, precision and F-Measure for classes with lower number of instances. The performance of classification model improves with application of data level algorithm. However, it highlights the supremacy of oversampling algorithm over undersampling algorithms. � 2021, Springer Nature Switzerland AG.	en_US
dc.description.nature	Final	en_US
dc.identifier.doi	10.1007/978-3-030-90235-3_38
dc.identifier.epage	446
dc.identifier.scopus	2-s2.0-85120523452
dc.identifier.spage	435
dc.identifier.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85120523452&doi=10.1007%2f978-3-030-90235-3_38&partnerID=40&md5=9f40e54fe9a37bbd13aa3f30e8eadb05
dc.identifier.uri	https://irepository.uniten.edu.my/handle/123456789/26465
dc.identifier.volume	13051 LNCS
dc.publisher	Springer Science and Business Media Deutschland GmbH	en_US
dc.source	Scopus
dc.sourcetitle	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.title	Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling	en_US
dc.type	Conference Paper	en_US
dspace.entity.type	Publication

Collections

SCOPUS

Publication: Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling

Options

Files

Collections

Publication:
Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling