Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INTEGRITY: Rewrite project in python #24

Open
wants to merge 119 commits into
base: integrity
Choose a base branch
from

Conversation

InariInDream
Copy link

No description provided.

@lephilousophe lephilousophe added the GSoC Part of a Google Summer of Code project label Jun 11, 2024
Copy link
Member

@sev- sev- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very solid start. I put some questions and notes.

Also, please update the README with deployment instructions

db_functions.py Outdated Show resolved Hide resolved
dat_parser.py Outdated Show resolved Hide resolved
dat_parser.py Outdated Show resolved Hide resolved
user_fileset_functions.py Outdated Show resolved Hide resolved
db_functions.py Outdated Show resolved Hide resolved
user = f'cli:{getpass.getuser()}'
create_log(escape_string(category_text), user, escape_string(log_text), conn)

def compare_filesets(id1, id2, conn):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will not scale well since you're basically doing quadratic computation.

I propose the following approach:
*Pick a file from the incoming fileset

  • Select from the fileset set of fileset ids that have these files with checksum matching. should be a single SELECT
  • If the list is too big, like, over 100 entries (low chance), add one more file
  • Iterate over the resulting files, or even ask SQL to do it with UNION ALL

Eventually, we will have like 15k filesets, so quadratic computations are not feasible

megadata.py Show resolved Hide resolved
pagination.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GSoC Part of a Google Summer of Code project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants