Content-Based Feature Extraction and Extreme Learning Machine for Optimizing File Cluster Types Identification

No Thumbnail Available
Ali R.R.
Al-Dayyeni W.S.
Gunasekaran S.S.
Mostafa S.A.
Abdulkader A.H.
Rachmawanto E.H.
Journal Title
Journal ISSN
Volume Title
Springer Science and Business Media Deutschland GmbH
Research Projects
Organizational Units
Journal Issue
Recent research in digital forensic attempts to classify image clusters into JPEG or non-JPEG clusters before recovering JPEG image files. This issue might improve the recovering JPEG image accuracy and reduce the processing time. In this work, three content-based feature extraction methods are used. The Rate of Change (RoC) is used for tracking relevant bytes in the appropriate groups of their orders. Entropy and Byte Frequency Distribution (BFD) are used to produce an image cluster histogram based on the size of the byte value. Subsequently, we deploy the Extreme Learning Machine (ELM) classifier to evaluate these three features. The ELM identifies the type based on the generated feature vector, whether a JPEG file or a non-JPEG file type. The proposed method is implemented in MATLAB 2017a software and tested and evaluated by using the DFRWS dataset. The test results show that the ELM produces high classification accuracy in identifying the file type. The difference in accuracy between the combinations of the tested features is relatively small. The worst accuracy is generated when the entropy method is used, which is 72.62%, and the best accuracy of 93.46% is generated when using a combination of the three features. � 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.