The following queries.py file is the template code for COMP3005 Database Project 1, which will be used for the autograder.
This README is an important guide to effectively using this repository and finishing this project.
- Changed CSV writer to encoding UTF-8. This would allow students who do not have the UTF-8 locale to test this.
- queries.py
- This will be the file you will write your queries in AND submit.
- dbexport.sql
- This is an example of an exported database. This can be used for testing, for both you and myself.
This project, and in turn the autograder and starter code, uses psycopg3 on a v22.04 Ubuntu Linux environment.
As per the project guidelines,
"Design a database that stores a soccer events dataset spanning multiple competitions and seasons. The provided dataset is in JSON format and can be downloaded from https://github.com/statsbomb/open-data/tree/0067cae166a56aa80b2ef18f61e16158d6a7359a1. The documentation of the dataset is also available in the above URL. After designing the database, you need to import the data from the JSON files into your database."
Once your database has been designed, you are then tasked to export this database into an .sql file named "dbexport" - this can be accomplished using pg_dump.
Given that the previous two steps are accomplished, your task is to now ONLY write your SQL queries within the prompted space within each Q_# method (where # is the question number).
The queries.py file and your exported dbexport.sql file should be within the same directory.
The queries.py file includes the following code snippet,
root_database_name = "project_database"
query_database_name = "query_database"
db_username = 'postgres'
db_password = '1234'
db_host = 'localhost'
db_port = '5432'
To briefly explain, these variables are used to connect to the root database named "project_database" and your query execution database named "query_database." The code's process for each query is to create a database named "query_database", import your dbexport.sql file into this database, execute the query, and then drop the database (to avoid any alterations so results are not affected down the line). The reasoning for two databases is because the connection cannot drop a database it is currently connected to, hence the two databases - one for a root connection and one for query execution.
You may change these values to test on your end, but under no circumstance in the final deliverable should these initial values be different.
Expected Output:
While testing, your output will be the dbexport.sql's log, and then following are your query times. INC simply means incomplete.
As the autograder is also connecting to your databases, to reiterate, any change to the initial values of the connection variables will result in your code submission becoming void. You may change these values for your own testing purposes, (e.g, you have a different password), but do so at your discretion.
- What else will VOID your submission?
- Any additional submissions in your repository.
- Other print statements.
- Other alterations to code other than the query executions (please view ACADEMIC INTEGRITY below).
Your source code file(s) that maps and loads the existing JSON dataset from the JSON files into your database. This code must be stored in a directory named "json_loader". Therefore, in your submission repository, you are only submitting the script "queries.py", your dbexport.sql, ".gitattributes" when you import the dbexport.sql as an LFS, and the "json_loader" directory. Any additional submissions will void the entire code submission.
GitHub has a restriction to submission size - if any file exceeds 100.00 MB it simply cannot be added to your repository - so for this case, your dbexport.sql. Luckily, you may use Git LFS to your advantage. Git LFS (Large File Storage) allows the submission of these large files by using reference pointers to get this data - this will be beneficial for both you and the grader.
Steps:
-
Install git and git lfs in your Linux terminal, use the following commands to install both.
sudo apt-get install git-all
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
Next, you will set up your GitHub username and email,
git config --global user.name "Your Name"
git config --global user.email "Your Email"
-
Have a directory based on your submission, in other words, have a folder on your system containing all files you will have in your repository submission. This is both easier and organized for you, but also allows you to do the git commands without the worry of affected anything outside of it.
- Open the terminal with this directory, and then write the following line, this will create a HIDDEN .git folder in your folder:
git init
- Then, write the following line to track your dbexport into a smaller reference file:
git lfs track "*.sql"
-
Once this is done, you can then do the following lines of code in your terminal.
git branch -M main
git remote add origin "your submission .git link"
git add "your files that you will add to your submission"
- Example:
git add queries.py
- Example:
-
Once you add your files, you can check your additions to the commit by doing
git status
- BEFORE
- AFTER ADDING
-
Then, you can commit using
git commit -m "Commit Message"
-
And then finally, you can push to your repository,
git push -u origin main
-
The terminal would then prompt you for your GitHub username and password, however, as of August 13th, 2021, GitHub has removed account password authentication for terminal pushes. Therefore, please use this following article to set up your own authentication key.
-
It will loop and continously ask for your username and password based on how many files you are pushing!
-
Your following submission should then look like the following:
If you run into any fatal errors or bugs, please consult the closed issues first as it might have already been solved. If it hasn't been solved, and/or you also have questions, please feel free to create an open issue!. If need be, you can also shoot me an email at gabemartell@cmail.carleton.ca - although I would respond quicker to the GitHub issues.
Any alterations to the code, such as modifying the time, will be flagged for suspicion of cheating - and thus will be reviewed by the staff and, if need be, the Dean. To review the Integrity Violation Attributes of Carleton University, please view https://carleton.ca/registrar/academic-integrity/