Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update preprocessing scripts #2

Open
yashkurkure opened this issue Oct 30, 2024 · 3 comments · Fixed by #3
Open

Update preprocessing scripts #2

yashkurkure opened this issue Oct 30, 2024 · 3 comments · Fixed by #3
Assignees

Comments

@yashkurkure
Copy link
Collaborator

yashkurkure commented Oct 30, 2024

Preprocessing scripts

This will contain all the code, tests, and data used in preprocessing. Create a new directory called /preprocessing at the root of the project. This directory will contain the following:

  • folder data which contains the input data from the ANL website
  • script preprocessing_theta_23_24.py
  • script preprocessing_polaris_23_24.py
  • a README.md on how to use the preprocessing.py scripts

data

The data folder will contain the following:

preprocessing_theta_23_24.py

This will read the files ANL-ALCF-DJC-THETA_20230101_20231231.csv and ANL-ALCF-DJC-THETA_20240101_20240930.csvthen output a file called theta_23_24.swf.

  • first job id should begin with 1
  • file should be sorted by submit column
  • all entries should be integers (no decimal points in the swf)
  • ignore jobs whose used_proc is 0
  • ignore jobs whose req_proc is 0
  • ignore jobs whose req_time is 0
  • ignore jobs part of the debug queue
  • theta_23_24.swf should contain jobs in the range Sep 1, 23 to Oct 7, 24
  • Include the following header in theta_23_24.swf:
; UnixStartTime: 0
; MaxNodes: 4360
; MaxProcs: 4360

preprocessing_polaris_23_24.py

This will read the files ANL-ALCF-DJC-POLARIS_20230101_20231231.csv and ANL-ALCF-DJC-POLARIS_20240101_20240930.csv then output a file called polaris_23_24.swf

  • first job id should begin with 1
  • file should be sorted by submit column
  • all entries should be integers (no decimal points in the swf)
  • ignore jobs whose used_proc is 0
  • ignore jobs whose req_proc is 0
  • ignore jobs whose req_time is 0
    These two requirements are specific to Polaris trace:
  • ignore jobs whose req_proc <= 8 and part of the debug queue
  • for jobs whose req_proc > 8 and part of debug queue, keep the job but in the swf file the req_proc should be procs in alcf file - 8
  • polaris_23_24.swf should contain jobs in the range Sep 1, 23 to Oct 7, 24
  • Include the following header in theta_23_24.swf:
; UnixStartTime: 0
; MaxNodes: 552
; MaxProcs: 552

SWF file structure

https://www.cs.huji.ac.il/labs/parallel/workload/swf.html

swf_columns = [
    'id',             #1
    'submit',         #2
    'wait',           #3
    'run',            #4
    'used_proc',      #5
    'used_ave_cpu',   #6
    'used_mem',       #7
    'req_proc',       #8
    'req_time',       #9
    'req_mem',        #10 
    'status',         #11
    'user_id',     #12
    'group_id', #13
    'num_exe',        #14
    'num_queue',         #15
    'num_part',       #16
    'num_pre',        #17
    'think_time',     #18
    ]

@yashkurkure
Copy link
Collaborator Author

@yashkurkure yashkurkure linked a pull request Nov 1, 2024 that will close this issue
@mochiiten9158
Copy link
Owner

#4 (comment)

Preprocessing for theta and polaris

@mochiiten9158
Copy link
Owner

Updated: New Pull Request with the correct preprocessing files
#5 (comment)

yashkurkure added a commit that referenced this issue Nov 2, 2024
Issue #2 Resolved by creating preprocessing scripts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants