Skip to content

This is an implementation of URL-Tokenizer in a paper "Segmentation-based Phishing URL Detection"

Notifications You must be signed in to change notification settings

ESDAUNG/Segmentation-basedPhishingURLDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Segmentation-basedPhishingURLDetection

This is an implementation of URL-Tokenizer in my paper "Segmentation-based Phishing URL Detection".

The paper is published in WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. Paper is available @.https://doi.org/10.1145/3486622.3493983

Our Hypothesis: Information extracted from URLs might indicate significant and meaningful patterns essential for phishing detection. To enhance the accuracy of URL-based phishing detection, we need an accurate word segmentation technique to split URLs correctly. However, in contrast to traditional word segmentation techniques used in natural language processing (NLP), URL segmentation requires meticulous attention, as tokenization, the process of turning meaningless data into meaningful data, is not as easy to apply as in NLP. In our work, we concentrate on URL segmentation to propose a novel tokenization method, named URL-Tokenizer, by combining the Bert tokenizer and WordSegment tokenizer.

About

This is an implementation of URL-Tokenizer in a paper "Segmentation-based Phishing URL Detection"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages