Moirai

Prerequisites

Get gurobipy LICENSE (commercial trial works for 30 days)

Install the required dependencies:

pip3 install pandas, gurobipy, numpy, matplotlib

Usage

Placing traces

First, download and decompress the traces from Moirai-SOSP25-logs.

Second, organize the traces into the following folders:

Table size files (report-table-size-*.csv): place under Moirai/
Workload files (from 2024/10 onward, report-abFP-volume-table-*.csv): place under Moirai/newTraces/
Job traces (%Y%m%d-Presto.csv or %Y%m%d-Spark.csv): place under Moirai/jobTraces/

Running Optimization

Run the following command to verify outputs under sample_0.050:

python3 tests.py --test=samplek --c=30 --k=0.05 --num_week=2 --rep_rate=0.002 --Spark

Running Scheduler

Job traces in jobTraces/ contain per-job information rather than aggregated optimizer traces. Run the scheduler after optimization:

python3 scheduler.py --c=30 --num_week=2 --opt_path="sample_0.050"

Note: This process takes ~30 minutes per week of job traces. Since the example runs for two weeks, expect ~1 hour.

Another flag is --simple to run the scheduler without simulating the traffic rate per minute. This can save you time if you do not care the traffic rate.

Complete experiments

Note: If you re-run these commands, it won't cover the results.

python3 tests.py --test=samplek --k=1 --num_week=13 --rep_rate=0.002 --Spark --c=30
python3 scheduler.py --num_week=13 --opt_path="sample_1.000" --c=30

python3 tests.py --test=long_term --Spark
python3 tests.py --test=reorg_unaware --Spark

Other useful flags (see more in --help):

tests.py
- --view: displays parameters without running the optimization.
- --opt_start_date: specifies the start date for optimization (default: 2024-10-22).
scheduler.py
- --debug: runs a smaller subset of traces for debugging.

Traces

cputime column in Spark traces (both the ones under jobTraces/ and newTraces/) represents the total CPU time in seconds for the job. Therefore, you should not sum them up to get the total CPU time for the job.

Other notes

Code related to Yugong is our baseline from VLDB 2019.

python3 tests.py --test=yugong --num_week=13 --rep_rate=0.004 --c=30
python3 scheduler.py --yugong --num_week=13 --opt_path="yugong_results" --c=30

Other baselines:

Without pre-selecting replication, can we achieve enough speedup with sampling? Try k=0.001, 0.01, 0.05

python3 tests.py --test=samplek --k=0.001 --num_week=13 --rep_rate=0 --Spark --c=30
python3 tests.py --test=samplek --k=0.01 --num_week=13 --rep_rate=0 --Spark --c=30
python3 tests.py --test=samplek --k=0.05 --num_week=13 --rep_rate=0 --Spark --c=30
python3 tests.py --test=samplek --k=0.1 --num_week=13 --rep_rate=0 --Spark --c=30
python3 scheduler.py --num_week=13 --opt_path="sample_0.050" --c=30

How do other scheduling policies perform?

python3 scheduler.py --num_week=13 --opt_path="sample_1.000_rep0.002" --policy="size-aware" --c=30
python3 scheduler.py --num_week=13 --opt_path="sample_1.000_rep0.002" --policy="size-unaware" --simple --c=30

How do other replication strategies perform?

python3 tests.py --test=samplek --k=1 --num_week=1 --rep_rate=0.002 --Spark --c=50 --rep_strategy="read_traffic_density"

Baselines

python3 baselines.py --baseline="rep_x_month" --rep_rate=0.21 --c=30 # Rep 3 month
python3 baselines.py --baseline="rep_rtd" --rep_rate=0 --c=30 # No rep

Customized test

python3 tests.py --test=samplek --k=1 --num_week=1 --rep_rate=0.001 --c=10 --opt_start_date="2025-03-04" --table_size_file="report-table-size-20250310.csv"

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Yugong		Yugong
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baselines.py		baselines.py
optimizer.py		optimizer.py
placement.py		placement.py
requirements.txt		requirements.txt
scheduler.py		scheduler.py
tests.py		tests.py
utility.py		utility.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Moirai

Prerequisites

Usage

Placing traces

Running Optimization

Running Scheduler

Complete experiments

Traces

Other notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Moirai

Prerequisites

Usage

Placing traces

Running Optimization

Running Scheduler

Complete experiments

Traces

Other notes

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages