The main purpose of site_size_rechecker.py
script is to automate the checking of older sites that are listed in the 512kb.club and ensuring that the size is updated.
- Read
sites.yml
file - Analyze
last_checked
key pair- Sort key pair in ascending order earliest first
- Non-date values are listed before dated values such as "N/A"
- GTmetrix.com account
- Python3 with pip
- ruamel.yaml
- Create an account with GTmetrix.com
- Go to account settings and generate an API Key
- Install ruamel.yaml Python library (available via pip or a package manager).
- Install python-gtmetrix.
- It is recommened to just Git clone the repo as they only require requests which have pip and an OS package.
- Create authintication file named
myauth.py
with the following format:email='email@example.com' api_key='96bcab1060d838723701010387159086'
- email: is the one used in creating a GTmetrix account
- api_key: is what was generated in step 1.1
- Copy the
site_size_rechecker.py
andmyauth.py
into thepython-gtmetrix
cloned in step 3
Note: Under the new plan you will first receive 100 credits from GTmetrix for testing. after which you will get a refill of 10 credits every day at 8:45 PM +0000
. This script uses 0.7 credits for each site check. which is about 14 site reports per day per person
while in the python-gtmetrix
folder run:
python script2.py ../512kb.club/_data/sites.yml XY
Note: XY stands for the number of sites to be checked
Successful output will generate a table in markdown file which Must be put in the PR such as #450
Site | old size (team) | new size (team) | delta (%) | GTmetrix | note
---- | --------------- | --------------- | --------- | -------- | ----
[docs.j7k6.org](https://docs.j7k6.org/) | 73.0kb (green) | 72.9kb (green) | -0.1kb (-0%) | [report](https://GTmetrix.com/reports/docs.j7k6.org/PkIra4ns/#waterfall) |
Note: In the middle of each line it takes about 30 seconds in wait-time to output the rest of the line. This is due to the time it takes to finish the GTmetrix scan
This can be beneficial to know if a site has a problem that can be used to check the site or remove it from the checking.
If everything goes right, you should get a table-like output which you can just paste into Github PR:
Note that it "hangs" for about 30 seconds in the middle of each line except the first two, as it first prints site name and old size, then waits for GTmetrix scan to finish, and after that prints new size and rest of the line.
This is done so if the script encounters an issue when running GTmetrix scan, you know which site it happened with, and can either check it manually or exclude the site from checking.
To decrease waiting time,
edit the python-gtmetrix/gtmetrix/interface.py file and change the number 30
in line 85 to a smaller number - for example, change this line from
time.sleep(30)
to
time.sleep(3)
This will decrease the delay between each check when the script is waiting for the GTmetrix scan to finish.
The recommended poll interval is 1 second. I suggest setting it to 3 seconds. By default in interface.py file is set to 30 seconds.
To exclude a site from checks you can either remove the site or change the last_checked
Key-Pair to today's date or a date in the future to make it last in the list.
In case you encounter an issue with this script open a New Issue and tagging @Lex-2008
Please provide as much information as possible such as:
- All Output
- Current state of
sites.yml
if it's from themaster
branch, or has been modified
To debug why the script "hangs" when checking some site, edit the python-gtmetrix/gtmetrix/interface.py file and a new 87th line which would looke like this:
Orginal file
response_data = self._request(self.poll_state_url)
self.state = response_data['state']
Edited file
response_data = self._request(self.poll_state_url)
print(response_data)
self.state = response_data['state']
This will break the nicely formatted table output, but you will see the raw response from GTmetrix API.
{'resources': {}, 'error': 'An error occurred fetching the page: HTTPS error: hostname verification failed', 'results': {}, 'state': 'error'}
Currently, this script doesn't check any errors returned by GTmetrix.com API. That's the next item on my list. Moreover, I will get rid of python-GTmetrix dependency, since it adds more troubles than benefits.