Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing word lists #40

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open

Missing word lists #40

wants to merge 32 commits into from

Conversation

zzJZzz
Copy link

@zzJZzz zzJZzz commented Jul 9, 2024

Details

  • I created a ruby script to compare the repo.json file with the current wordlist.yml file and add anything that was missing.
  • I triple checked the formatting was the same throughout.
    eg:
joomla-themes:
  :url: https://raw.githubusercontent.com/danielmiessler/SecLists/master/Discovery/Web-Content/CMS/joomla-themes.fuzz.txt
  :summary: Discovery wordlist
  :categories:
  - discovery

vs.

alexa-top-1000:
  :url: https://github.com/urbanadventurer/WhatWeb/blob/master/plugin-development/alexa-top-1000.txt
  :summary: The Alexa Top 1000 domain names.
  :categories:
    - dns
    - domains
  • When a list had a generic name eg: 13 I changed it to include where the list is from and what it contained like weakpass-wordlist. from the PR comments. The goal was to add more detail so it would be more apparent at first glance.

Testing

  • I also ran the rspec for the file and 5 green passing tests.
  • I visually inspected each new wordlist for anything that stood out visually.

@postmodern postmodern self-requested a review July 9, 2024 21:21
Copy link
Member

@postmodern postmodern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I can at least view the diff now with git diff main..zzJZzz/missing-word-lists locally. There are so many new entries that GitHub is refusing to render the diff for me. A few things I noticed:

  • All summaries must end with a . otherwise the linting tests complain.
    This could be fixed with some vim-fu.
  • One of the entries uses seclist- while others use seclists-. I think we should standardize on seclists-.
  • Should we use SecLists- or seclists-?
  • This raises the question, should all wordlists which are downloaded from the SecLists repo, be prefixed with seclists-/SecLists-?
  • A few of the entry names have spaces in them. The wordlist name should at least try to match the file name (ex: weakpass-Wordlist 38 -> weakpass-Wordlist38).
  • Remove all wordlists who's URL contains Fuzzing/User-Agents/operating-platform. Appears that these are User-Agent wordlists grouped by OS/platform. Apparently LG User-Agent strings add a random string to them, which caused SecLists to generate a bunch of wordlist files that only contain one (!!) User-Agent string. (ex: https://github.com/danielmiessler/SecLists/blob/master/Fuzzing/User-Agents/operating-platform/lg-4iqj.txt)

@postmodern
Copy link
Member

@zzJZzz nice work on writing a script to import the entries!

I may also have to add more Category tags to better describe/group the wordlists.

@zzJZzz
Copy link
Author

zzJZzz commented Jul 9, 2024 via email

@zzJZzz
Copy link
Author

zzJZzz commented Jul 12, 2024

Hello. I addressed the changes above. with the exception of

This raises the question, should all wordlists which are downloaded from the SecLists repo, be prefixed with seclists-/SecLists-?

As I wasn't sure if there was an official decision. I can go back and change, or perhaps another issue could be created?

In addition:

I did remove a few wordlists that gave me trouble when I imported. Maybe about 40? The original repo had spaces in between which showed up on import. If that's how they are supposed to be then I can add in the %20, or if they should have the _ or -, I can do that too. When I tried to find those wordlists from the actual website and tried out some of the urls to see, I did not have success so I removed for now.

@postmodern
Copy link
Member

The original repo had spaces in between which showed up on import. If that's how they are supposed to be then I can add in the %20, or if they should have the _ or -, I can do that too.

I would use hyphens, or whatever the filename of the wordlist is without the file extension (ex: foo-bar100.txt -> foo-bar100).

When I tried to find those wordlists from the actual website and tried out some of the urls to see, I did not have success so I removed for now.

This is a good point. We should check if the wordlist URLs do not return 404. I can probably add tests to do a HEAD request for each wordlist URL to check the status code.

@postmodern
Copy link
Member

postmodern commented Aug 2, 2024

@zzJZzz I have now added a lint:wordlists rake task and hooked it up to GitHub Actions to run anytime data/wordlists.yml is changed. This will only lint the wordlists metadata in data/wordlists.yml once. We still do not actually check if the URL is still alive, due to web servers rate limiting the number of requests we can send. Definitely rebase against main to get the new GitHub Actions linting.

Copy link
Member

@postmodern postmodern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's extra whitespace on the end of :categories:. You can do %s/\v\s+$//g in vim to strip any tailing whitespace.

@zzJZzz
Copy link
Author

zzJZzz commented Sep 13, 2024

@postmodern My apologies for the delay. My day job had some deadlines and we had some everyday life things. I believe all of the linting is passing now!

Copy link
Member

@postmodern postmodern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove any wordlists that contain withcount or have .csv in the file extension, as we cannot process multi-column wordlist files.

Also, I think YAML Array elements should be indented by two spaces:

:categories:
  - one
  - two

Sorry this is taking so long. :(

@@ -17,6 +17,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These newlines don't seem necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants