Publication: Document classification based on kNN algorithm by term vector space reduction
Date
2018
Authors
Moldagulova A.
Sulaiman R.B.
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE Computer Society
Abstract
Nowadays there is an increasing interest in the area of unstructured data analysis. The vast majority of unstructured data belongs to unstructured text data. Retrieving useful information from huge volume of unstructured text data is very challenging task. Text mining is a thought-provoking research area as it tries to discover knowledge from unstructured text. This paper deals with methods used for handling unstructured text data in particular document classification problems. Most document classification methods based on term vector space model of representation of unstructured textual data. The term vector space model is easy to implement, provides uniform representation for documents. However feature space for a large collection of documents can reach millions and be sparse. One of the issues is to reduce the dimension of the term-document matrix. In this research we proposed an approach for reduction of term vector space in KNN algorithm. � ICROS.
Description
Classification (of information); Data handling; Data mining; Information retrieval systems; Learning algorithms; Text processing; Vectors; Document Classification; Space reductions; Text classifiers; Text mining; Textual data; Unstructured data; Vector spaces