Welcome to BessBench, a benchmark designed for measuring the performance of foundation models in battery energy storage and home energy management scenarios. This tool helps evaluate how well models can perform in real-world energy settings. We focus on key metrics that matter in the energy sector.
BessBench uses a set of evaluation criteria inspired by ElecBench. Here are the metrics we assess:
- Expressiveness: How well can the model convey complex ideas?
- Factuality: Are the model's outputs correct and true?
- Logicality: Does the model follow a logical reasoning process?
- Stability: How consistent are the model's responses over time?
- Fairness: Does the model treat all scenarios impartially?
- Security: How well does the model handle security concerns?
- Agentic Abilities: This includes multi-step reasoning, long-term planning, and tool use.
We added a new dimensionโagentic abilitiesโto reflect the needs for large language models to act as agents in real-world tasks.
BessBench is designed to be user-friendly for all levels of users. Hereโs how you can get started:
-
Ensure your system meets these basic requirements:
- Operating System: Windows, macOS, or a recent version of Linux.
- Memory: At least 4GB of RAM.
- Storage: 500MB of available disk space.
- Network: Internet access for downloading necessary files.
-
Visit this page to download the latest version of BessBench.
To download BessBench:
- Click the button below to go to the Releases page.
- Select the latest release version.
- Download the file suitable for your operating system.
- Run the downloaded file to install BessBench on your computer.
BessBench offers several features to make your benchmarking experience straightforward:
- User-Friendly Interface: Navigate effortlessly through the application.
- Realistic Scenarios: Benchmark across multiple energy-related scenarios.
- Detailed Reports: Get insightful reports on model performance.
- Modular Design: Customize your evaluation metrics based on your needs.
Once installed, follow these steps to run BessBench:
- Open the BessBench application.
- Choose the benchmarking scenario that suits your needs.
- Input your model or data for evaluation.
- Click "Run Benchmark" to start the evaluation process.
- Review your results and generated reports in the application.
BessBench uses a scenarioโmetrics structure for effective evaluation. The following scenarios are included:
- Energy Storage Technologies & Devices: Benchmark battery systems and inverter technologies.
- Home Energy Management: Evaluate models that manage energy use in residential settings.
We encourage users to participate in the BessBench community. If you have questions or need support:
- Check the Issues page for common questions or existing reports.
- Join discussions and offer feedback on how we can improve BessBench.
- Contribute to the project by submitting your own benchmarking scenarios or metrics.
BessBench is open-source software, and it is available under the MIT License. You can use, modify, and distribute the software freely.
We thank the BessBench community and contributors for their support. Special thanks to the creators of ElecBench and HELM for their foundational work in this area.
Feel free to reach out at any time. Happy benchmarking!