Skip to content

dchasap/cwm-simulator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

cwm-simulator

A SLURM-like cluster workload manager simulator, for evaluating power-aware scehduling policies.

Simulator allows users to implement their own policies and test them. Also offers scripts to create their job load, given a number of jobs. The simulator uses application traces to run jobs (it does not actually run them).

  • Configuration Files: The following files need to created in the conf directory. These files describe the workload, the cluster and workload manager settings.

    • job-list: a list of jobs that are to be used to create a workload (used to generate traffic, not by the simulator itself). Should contain tuplets of job name and estimated execution time

    • host-list: a list of host machine names

    • job_profiler_manager.txt: setting options for job profiler (job profiler is used to process traces and make power predictions for jobs)

    • sched_default.txt: settings for scheduler options

    • traffic: a file describing traffic, used by gen_traffic.py to replay a traffic scenario and send jobs to be scheduled by the simulator.

  • Application Traces and predictions: Traces are list of tuplets that contain a timestamp and a corresponding power consumption. The actual exectuion time is also required. Example traces and statistics used for making power predictions can be found in the data/ folder.

  • Generate/Replay Traffic: ./script/gen_traffic.py This script is used to either generate a traffic load and/or replay it to send jobs to the simulator for execution. Without replaying, users need to manually send jobs. See .scripts/gen_traffic.py --help for more information.

  • Simulator: ./scripts/cluster_simulator.py Accepts incoming jobs and schedules them according to user implemented scheduling policies (some already implemented examples exist in scripts). Terminates when "quit" job is received. See ./scripts/cluster_simulator.py --help for more information.

An example of running the simulator with a given pre-generated traffic file.

./scripts/gen_traffic.py --job-list ./conf/jobs.txt \ --time-unit 1 \ --replay-traffic \ --traffic-infile ./conf/traffic_bursty.txt | \ ./scripts/cluster_simulator.py \ --host-list ./conf/hosts.txt \ -v --time-unit 1 \ --enable-statistics \ --stats-log stats.txt \ --stats-trace-file stats_trace.csv \ --log log.txt \ --scheduler naive \ # this is SLURM extended --global-power-budget 20000 \ --time-limit 3600 \ --job-power-cap 120