Skip to content

Commit

Permalink
added go.mod & changelog
Browse files Browse the repository at this point in the history
  • Loading branch information
cyclone-github authored Jan 2, 2024
1 parent 2d3e916 commit a0dfe0d
Show file tree
Hide file tree
Showing 3 changed files with 42 additions and 2 deletions.
29 changes: 29 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
### version 0.7.1
```
added progress bars to word / ngrams processing & file writing operations
added RAM usage monitoring
optimized order of operations for faster processing with less RAM
TO-DO: refactor code
```
### version 0.7.0
```
added feature to allow crawling specific file extensions (html, htm, txt)
added check to keep crawler from crawling offsite URLs
added flag "-delay" to avoid rate limiting (-delay 100 == 100ms delay between URL requests)
added write buffer for better performance on large files
increased crawl depth from 5 to 100 (not recommended, but enabled for edge cases)
fixed out of bounds slice bug when crawling URLs with NIL characters
fixed bug when attempting to crawl deeper than available URLs to crawl
fixed crawl depth calculation
optimized code which runs 2.8x faster vs v0.6.x during bench testing
```
### version 0.6.2
```
fixed scraping logic & ngram creations bugs
switched from gocolly to goquery for web scraping
remove dups from word / ngrams output
```
### version 0.5.10
```
initial github release
```
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,12 @@ Wordlist & ngram creation tool to crawl a given url and create wordlists and/or
- If you want the latest features, compiling from source is the best option since the release version may run several revisions behind the source code.
- Compile from source code info:
- https://github.com/cyclone-github/scripts/blob/main/intro_to_go.txt

### Change Log:
- https://github.com/cyclone-github/spider/blob/main/CHANGELOG.md
### Mentions:
- Go Package Documentation: https://pkg.go.dev/github.com/cyclone-github/spider
- Softpedia: https://www.softpedia.com/get/Internet/Other-Internet-Related/Cyclone-s-URL-Spider.shtml

### Antivirus False Positives:
- Several antivirus programs on VirusTotal incorrectly detect compiled Go binaries as a false positive. This issue primarily affects the Windows executable binary, but is not limited to it. If this concerns you, I recommend carefully reviewing the source code, then proceed to compile the binary yourself.
- Uploading your compiled binaries to https://virustotal.com and leaving an up-vote or a comment would be helpful as well.
- Uploading your compiled binaries to https://virustotal.com and leaving an up-vote or a comment would be helpful as well.
10 changes: 10 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
module spider.go

go 1.21.5

require github.com/PuerkitoBio/goquery v1.8.1

require (
github.com/andybalholm/cascadia v1.3.1 // indirect
golang.org/x/net v0.7.0 // indirect
)

0 comments on commit a0dfe0d

Please sign in to comment.