Skip to content

Releases: joshuaDeal/search-sasquatch

search-sasquatch v0.4.0

20 Aug 19:12
Compare
Choose a tag to compare

v0.4.0

I've successfully added basic support for preforming image searches. I hope to flesh this new feature out some more in the future. Currently it is kind of primitive.

I also have switched from python's mysql.connector module to a module specific to mariadb. I would have used this mariadb module to begin with had I known of it's existence. I apparently hadn't done enough research before choosing what modules to use when interfacing with the database on the backed, and I assumed backwards compatibility with mysql was universal all throughout mariadb. Live and learn. You can learn more about this issue from this thread on stackoverflow.

That's everything major that has changed for this release. Outside of that there have been a few improvements here and there to results.php (specifically, I have made the process of loading pagination links it's own function), and a few new terms have been introduced to naughty-words.txt

The Future

Optimization is going to become my new top priority. The biggest problem that plagues this project at the current moment is how uselessly slow it is. This will probably require much learning and code reviewing on my part, which I think will be good for me. Especially accounting for the fact that this project exists as a learning exercise for me anyways. We'll see. ¯\_(ツ)_/¯

search-sasquatch v0.3.0

25 Jun 07:47
Compare
Choose a tag to compare

v0.3.0

The biggest new feature is safe search. User's can now turn safe search on and off to filter out 'unsafe' search results. A page is marked as safe or unsafe by extractor.py. The extractor script makes use of naughty-words.txt to decided if a page should be filtered out based on the content of said page.

The next major new feature is the style sheet selector that appears in the footer of the page. This is pretty straight forward, it lets the end user pick a preferred color scheme from a small selection. The feature hasn't really enhanced the core functionality of this project at all, I simply introduced it because I wanted to see how something of that sort would work. I suppose it exists "just because."

Crawler.py and extractor.py have finally received some minor but much needed bug fixes. There's still a lot of work to be done in this area, but I've made some steps in the right direction I think.

I've also added a handful of new sites to crawlme.md.

The Future

As for the future. I have nothing super new to say here. Just the usual, with the exception of an image search functionality. I've had this idea swimming in my head for a while and I think it would be fun to try and implement.

I think I'm going to start slowing down with this project to focus on some others for a while. I'm still far from done here, and I'm proud of what I've accomplished with this so far. However, I'm starting to burn out on this and I want to put my energy towards something else.

search-sasquatch v0.2.0

04 Jun 04:36
Compare
Choose a tag to compare

v0.2.0

I've addressed several of the suggestions purposed in v0.1.0's release notes. Firstly, I've made the searching process external to the php script. It's now implemented in a python script called search.py. Search.py can output json data that can be used by results.php (Formerly search.php). Also, search.py can simply output in a format that's easy to read and digest in the command line, sort of making it work as a standalone front end to the search engine.

Because of how pulling the searching functionality out of search.php and making it it's own program made the entire code base more coherent, I was easily able to introduce pagination functionality to the project. Pagination originates inside of search.py, using command line options you can set the number of results to load per page along with what page to load. Its json output includes information useful to results.php (the total number of results and the total number of pages). Results.php uses this data to build the pagination links that now appear at the bottom of every results page.

Init.sh now creates a directory called '/opt/search-sasquatch'. This directory is for users like http or www-data. It contains files that they will need in order to keep the site running correctly.

I've changed the way the URLs are formatted. Doing this involved changing results.php so that we're making GET requests instead of using POST. This change has been very convenient for quite a few reasons, and I suspect the fact that opensearch.xml now seems to have the intended effect when using firefox is related to this change.

Small improvements here and there have been made to style.css. Particularly worth mentioning are some improvements made with viewing the site on mobile devices in mind. I have also added a footer to the page. Currently, it just contains a link to this repository on github, but as I add more things to the site, like probably a about page, more links will begin to populate it.

I've added a new markdown file called crawlme.md. It's a list of websites that may be worth starting a webcrawl on. About all of the content in it was pulled from chatgpt, so you should be suspicious about it and its effectiveness.

It's been a wild journey from v0.1.0 to now. Crazy to think that its only been around a week. Now, let us look forward to v0.3.0

The Future

And by us, I really only mean me, seeing as to the fact that I'm the only person working on this project. However, that may change someday, one thing I'd like to do is write up some contributing guidelines. I hope that doing so will act as an invitation to collaboration. I'm very new to all this so I'm not currently to sure about how to go about gaining people's interest in this project.

I continue to neglect enforcing proper error handling. I need to just dedicate a whole day to doing this project wide. Also I need to address the performance issues that arise as the database grows in size.

I'm indecisive about how the site should look, so I am thinking about adding a drop down menu that lets the user change between style sheets. Creating that could be a fun little learning experience.

I also really need to grow my understanding of cybersecurity. Or I just need to find a buddy willing to do some bug bounty hunting free of charge.

I still don't have much as far as documentation goes. This will probably bite me in the butt because it will act to discourage potential collaborators and also probably my future self as well.

I also desperately want to create a safe search feature. Nothing too fancy, it will most likely just consist of a naughty word list that will be used to filter out naughty results.

I think that just about covers everything. All these words here make me feel like I've done something.

search-sasquatch v0.1.0

25 May 10:12
Compare
Choose a tag to compare

v0.1.0

Initial release so there's nothing new to say, but I would like to talk a bit about my plans for future updates.

The Future

One major change I'd like to make is the handling of the actual search. In its current state, search-sasquatch preforms its TF-IDF searching logic inside of the webroot/search.php file. I think for the sake of modularity and scalability this search should be preformed by a program (Probably another python script) external to webroot/search.php. This program will take the search terms from webroot/search.php as input and it will output its finding in a format that webroot/search.php can understand.

I should also note that crawler.py and extractor.py could both benefit from better error handling and performance enhancement.

The front end could also be improved aesthetically speaking, but doing so is nether my expertise nor my passion. Thus, web design improvements are not currently my highest priority.

Security always need to be considered and observed. I've tried my best to keep security in mind while creating search-sasquatch, however I am an amateur and am therefore prone to making mistakes. Patching bugs with security implications will always be a highest priority over implementing new features.

Init.sh's functionality can probably be expanded upon. It exists to make setting up an instance easier and I have no doubt that more can be done in this area. The statement about expanding functionality extends to purge.sh as well.

Opensearch.xml exists and I've tried to implement opensearch functionality, but it currently doesn't quite work as intended. I'm considering switching to chromium as my daily browser because it seems to handle integrating new search engines in a much more straightforward way.

Documentation, I've hardly created any. This definitely needs to be addressed before I really lose track of what's going on.

Anyways, I'm sure there's more I wanted to add to this rant, but its been a long night and I'm very sleepy. To the point of struggling to spell words and struggling to stay on topic. I'll add more to this if important things come to mind later after I've gotten some sleep.