Malicious URL Detection with Distributed Representation and Deep Learning

Do N.Q.; Selamat A.; Lim K.C.; Krejcar O.

doi:10.3233/FAIA220248

Publication:
Malicious URL Detection with Distributed Representation and Deep Learning

dc.contributor.author	Do N.Q.	en_US
dc.contributor.author	Selamat A.	en_US
dc.contributor.author	Lim K.C.	en_US
dc.contributor.author	Krejcar O.	en_US
dc.contributor.authorid	57283917100	en_US
dc.contributor.authorid	24468984100	en_US
dc.contributor.authorid	57889660500	en_US
dc.contributor.authorid	14719632500	en_US
dc.date.accessioned	2023-05-29T09:36:29Z
dc.date.available	2023-05-29T09:36:29Z
dc.date.issued	2022
dc.description	Computer crime; Convolutional neural networks; Embeddings; Natural language processing systems; Recurrent neural networks; Character level; Convolutional neural network; Deep learning; Distributed representation; Embeddings; Learning models; Malicious URL; Natural languages; Phishing detections; Word level; Websites	en_US
dc.description.abstract	There exist numerous solutions to detect malicious URLs based on Natural Language Processing and machine learning technologies. However, there is a lack of comparative analysis among approaches using distributed representation and deep learning. To solve this problem, this paper performs a comparative study on phishing URL detection based on text embedding and deep learning algorithms. Specifically, character-level and word-level embedding were combined to learn the feature representations from the webpage URLs. In addition, three deep learning models, including Convolutional Neural Network (CNN), Bidirectional Gated Recurrent Unit (BiGRU), and Bidirectional Long Short-Term Memory (BiLSTM), were constructed for effective classification of phishing websites. Several experiments were conducted and various evaluation metrics were used to assess the performance of these deep learning models. The findings obtained from the experiments indicated that the combination of the character-level and word-level embedding approach produced better results than the individual text representation methods. Also, the CNN-based model outperformed the other two deep learning algorithms in terms of both detection accuracy and execution time. � 2022 The authors and IOS Press. All rights reserved.	en_US
dc.description.nature	Final	en_US
dc.identifier.doi	10.3233/FAIA220248
dc.identifier.epage	180
dc.identifier.scopus	2-s2.0-85139801300
dc.identifier.spage	171
dc.identifier.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85139801300&doi=10.3233%2fFAIA220248&partnerID=40&md5=5f50d13b144b88a47aa50e591c4c048f
dc.identifier.uri	https://irepository.uniten.edu.my/handle/123456789/26747
dc.identifier.volume	355
dc.publisher	IOS Press BV	en_US
dc.source	Scopus
dc.sourcetitle	Frontiers in Artificial Intelligence and Applications
dc.title	Malicious URL Detection with Distributed Representation and Deep Learning	en_US
dc.type	Conference Paper	en_US
dspace.entity.type	Publication

Collections

SCOPUS

Publication: Malicious URL Detection with Distributed Representation and Deep Learning

Options

Files

Collections

Publication:
Malicious URL Detection with Distributed Representation and Deep Learning