Welcome to the safety360/ directory! This folder contains various AI safety implementations for LLM360 models.
We currently include the following folders:
bold/provides sentiment analysis with BOLD dataset.toxic_detection/measures model's capability to identify toxic text.toxigen/evaluate model's toxicity on text generation.wmdp/evaluate model's hazardous knowledge.