- Install the necessary packages. For more detailed installation instructions, please refer to OSWorld.
# python >= 3.9
pip install -r requirements.txt
pip install desktop-env
- Install Docker using the official instructions for your operating system. To verify the installation, run
docker -v. If it shows the version, it has been installed correctly.
- Deploy the ScaleCUA models with our guidence, and then record the model name and URL.
- In
eval.sh, replace the values for --url_set and --model with your actual model URL and name, respectively. - Start the evaluation using
bash eval.sh results/scalecua_3b_50step 2.
#!/bin/bash
# =================================================================
# Script to run the multi-environment CUA agent.
#
# Usage:
# bash your_script_name.sh <results_directory> <num_environments>
#
# Example:
# bash your_script_name.sh ./results 16
# =================================================================
# $1: The first command-line argument, specifying the path to the results directory.
RESULT_DIR=$1
# $2: The second command-line argument, defining the number of parallel environments to run.
NUM_ENVS=$2
# Create the results directory. The -p flag ensures parent directories are also created if they don't exist.
mkdir -p ${RESULT_DIR}
# Execute the main Python program
python run_multienv_cua.py \
--path_to_vm docker_vm_data/Ubuntu.qcow2 \
--headless \
--action_space pyautogui \
--observation_type screenshot \
--screen_width 1920 \
--screen_height 1080 \
--sleep_after_execution 2.0 \
--max_steps 50 \
--max_trajectory_length 100 \
--temperature 0.0 \
--top_p 0.9 \
--max_tokens 1000 \
--result_dir ${RESULT_DIR} \
--num_envs ${NUM_ENVS} \
--model scalecua \
--url_set http://10.140.66.46:10029/v1,http://10.140.66.27:10025/v1 \
2>&1 | tee -a ${RESULT_DIR}/run.log
- Show result for each domain.
python show_result.py --result_folder results/scalecua_3b_50step --model_name MODEL_NAME_IN_EVAL.SH
This repository is based on OSWorld. We have integrated ScaleCUA into its framework. Thanks for the excellent work.