Segmentation-basedPhishingURLDetection

This is an implementation of URL-Tokenizer in my paper "Segmentation-based Phishing URL Detection".

The paper is published in WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. Paper is available @.https://doi.org/10.1145/3486622.3493983

Our Hypothesis: Information extracted from URLs might indicate significant and meaningful patterns essential for phishing detection. To enhance the accuracy of URL-based phishing detection, we need an accurate word segmentation technique to split URLs correctly. However, in contrast to traditional word segmentation techniques used in natural language processing (NLP), URL segmentation requires meticulous attention, as tokenization, the process of turning meaningless data into meaningful data, is not as easy to apply as in NLP. In our work, we concentrate on URL segmentation to propose a novel tokenization method, named URL-Tokenizer, by combining the Bert tokenizer and WordSegment tokenizer.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
URL-tokenizer.py		URL-tokenizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Segmentation-basedPhishingURLDetection

About

Releases

Packages

Languages

ESDAUNG/Segmentation-basedPhishingURLDetection

Folders and files

Latest commit

History

Repository files navigation

Segmentation-basedPhishingURLDetection

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages