OpenEuroLLM CLI (oellm)

A lightweight CLI for scheduling LLM evaluations across multiple HPC clusters using SLURM job arrays and Singularity containers.

Features

Schedule evaluations on multiple models and tasks: oellm schedule-eval
Collect results and check for missing evaluations: oellm collect-results
Task groups for pre-defined evaluation suites with automatic dataset pre-downloading
Multi-cluster support with auto-detection (Leonardo, LUMI, JURECA)

Quick Start

Prerequisites: Install uv

# Install the package
uv tool install -p 3.12 git+https://github.com/OpenEuroLLM/oellm-cli.git

# Run evaluations using a task group (recommended)
oellm schedule-eval \
    --models "microsoft/DialoGPT-medium,EleutherAI/pythia-160m" \
    --task_groups "open-sci-0.01"

# Or specify individual tasks
oellm schedule-eval \
    --models "EleutherAI/pythia-160m" \
    --tasks "hellaswag,mmlu" \
    --n_shot 5

This will automatically:

Detect your current HPC cluster (Leonardo, LUMI, or JURECA)
Download and cache the specified models
Pre-download datasets for known tasks (see warning below)
Generate and submit a SLURM job array with appropriate cluster-specific resources

Task Groups

Task groups are pre-defined evaluation suites in task-groups.yaml. Each group specifies tasks, their n-shot settings, and HuggingFace dataset mappings.

Available task groups:

open-sci-0.01 - Standard benchmarks (COPA, MMLU, HellaSwag, ARC, etc.)
belebele-eu-5-shot - Belebele European language tasks
flores-200-eu-to-eng / flores-200-eng-to-eu - Translation tasks
global-mmlu-eu - Global MMLU in EU languages
mgsm-eu - Multilingual GSM benchmarks
generic-multilingual - XWinograd, XCOPA, XStoryCloze
include - INCLUDE benchmarks

Super groups combine multiple task groups:

oellm-multilingual - All multilingual benchmarks combined

# Use a task group
oellm schedule-eval --models "model-name" --task_groups "open-sci-0.01"

# Use multiple task groups
oellm schedule-eval --models "model-name" --task_groups "belebele-eu-5-shot,global-mmlu-eu"

# Use a super group
oellm schedule-eval --models "model-name" --task_groups "oellm-multilingual"

⚠️ Dataset Pre-Download Warning

Datasets are only automatically pre-downloaded for tasks defined in task-groups.yaml.

If you use custom tasks via --tasks that are not in the task groups registry, the CLI will attempt to look them up but cannot guarantee the datasets will be cached. This may cause failures on compute nodes that don't have network access.

Recommendation: Use --task_groups when possible, or ensure your custom task datasets are already cached in $HF_HOME before scheduling.

Collecting Results

After evaluations complete, collect results into a CSV:

# Basic collection
oellm collect-results /path/to/eval-output-dir

# Check for missing evaluations and create a CSV for re-running them
oellm collect-results /path/to/eval-output-dir --check --output_csv results.csv

The --check flag compares completed results against jobs.csv and outputs a results_missing.csv that can be used to re-schedule failed jobs:

oellm schedule-eval --eval_csv_path results_missing.csv

CSV-Based Scheduling

For full control, provide a CSV file with columns: model_path, task_path, n_shot, and optionally eval_suite:

oellm schedule-eval --eval_csv_path custom_evals.csv

Installation

General Installation

uv tool install -p 3.12 git+https://github.com/OpenEuroLLM/oellm-cli.git

Update to latest:

uv tool upgrade oellm

JURECA/JSC Specifics

Due to limited space in $HOME on JSC clusters, set these environment variables:

export UV_CACHE_DIR="/p/project1/<project>/$USER/.cache/uv-cache"
export UV_INSTALL_DIR="/p/project1/<project>/$USER/.local"
export UV_PYTHON_INSTALL_DIR="/p/project1/<project>/$USER/.local/share/uv/python"
export UV_TOOL_DIR="/p/project1/<project>/$USER/.cache/uv-tool-cache"

Supported Clusters:

We support: Leonardo, Lumi, and Jureca

CLI Options

oellm schedule-eval --help

Development

# Clone and install in dev mode
git clone https://github.com/OpenEuroLLM/oellm-cli.git
cd oellm-cli
uv sync --extra dev

# Run dataset validation tests
uv run pytest tests/test_datasets.py -v

# Download-only mode for testing
uv run oellm schedule-eval --models "EleutherAI/pythia-160m" --task_groups "open-sci-0.01" --download_only

Troubleshooting

HuggingFace quota issues: Ensure you're logged in with HF_TOKEN and are part of the OpenEuroLLM organization.

Dataset download failures on compute nodes: Use --task_groups for automatic dataset caching, or pre-download datasets manually before scheduling.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github/workflows		.github/workflows
apptainer		apptainer
oellm		oellm
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenEuroLLM CLI (oellm)

Features

Quick Start

Task Groups

⚠️ Dataset Pre-Download Warning

Collecting Results

CSV-Based Scheduling

Installation

General Installation

JURECA/JSC Specifics

Supported Clusters:

CLI Options

Development

Troubleshooting

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

OpenEuroLLM/oellm-cli

Folders and files

Latest commit

History

Repository files navigation

OpenEuroLLM CLI (oellm)

Features

Quick Start

Task Groups

⚠️ Dataset Pre-Download Warning

Collecting Results

CSV-Based Scheduling

Installation

General Installation

JURECA/JSC Specifics

Supported Clusters:

CLI Options

Development

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages