Harbor supports multiple cloud environment providers, allowing you to run hundreds or thousands of trials in parallel. This guide covers setup and usage for each supported provider.
Supported Providers
Harbor supports the following cloud execution environments:
Daytona - Fast container-based sandboxes with Docker-in-Docker support
Modal - Serverless containers with GPU support
E2B - Secure sandboxes with fast startup times
Runloop - Managed evaluation environments
GKE - Google Kubernetes Engine for large-scale deployments
Daytona
Daytona provides fast, ephemeral development environments perfect for agent evaluation.
Setup
Step 2: Set Environment Variable
export DAYTONA_API_KEY =< YOUR-KEY >
harbor run --dataset terminal-bench@2.0 \
--agent claude-code \
--model anthropic/claude-opus-4-1 \
--env daytona \
--n-concurrent 100
Configuration Options
Daytona environments support several configuration options:
# Set auto-delete interval (minutes after stop)
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env daytona \
--ek auto_delete_interval_mins:int= 30
# Set auto-stop interval (minutes of inactivity)
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env daytona \
--ek auto_stop_interval_mins:int= 60
# Use a snapshot template for faster startup
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env daytona \
--ek snapshot_template_name="harbor__{name}__snapshot"
Docker Compose Support
Daytona automatically detects when your task uses Docker Compose and creates a Docker-in-Docker environment:
services :
main :
build : .
volumes :
- agent-logs:/logs/agent
- verifier-logs:/logs/verifier
mcp-server :
image : my-mcp-server:latest
ports :
- "3000:3000"
volumes :
agent-logs :
verifier-logs :
When using Docker Compose, Daytona creates a DinD (Docker-in-Docker) sandbox and runs docker compose inside it. The main service is where agent commands execute.
Advanced Configuration
# Disable network access
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env daytona \
--ek network_block_all:bool= true
# Use custom DinD image for compose tasks
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env daytona \
--ek dind_image="docker:28.3.3-dind"
# Use DinD snapshot for faster compose startup
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env daytona \
--ek dind_snapshot="my-dind-snapshot"
Modal
Modal provides serverless containers with excellent GPU support, ideal for ML-heavy evaluations.
Setup
harbor run --dataset terminal-bench@2.0 \
--agent claude-code \
--model anthropic/claude-opus-4-1 \
--env modal \
--n-concurrent 50
GPU Support
Modal is the recommended provider for GPU-enabled tasks:
# Run task with GPU
harbor run --tasks ./my-gpu-task \
--agent claude-code \
--model anthropic/claude-opus-4-1 \
--env modal
Your task configuration specifies GPU requirements:
[ environment ]
gpus = 1
gpu_types = [ "a100" , "h100" ]
cpus = 8
memory = "32G"
Configuration Options
# Mount Modal secrets
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env modal \
--ek secrets='["my-secret-1", "my-secret-2"]'
# Mount Modal volumes
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env modal \
--ek volumes='{"path": "/data", "volume_name": "my-volume"}'
# Set sandbox timeout (seconds)
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env modal \
--ek sandbox_timeout_secs:int= 7200
# Set idle timeout
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env modal \
--ek sandbox_idle_timeout_secs:int= 300
E2B
E2B provides secure, fast-starting sandboxes with built-in internet isolation.
Setup
Sign up at
e2b.dev and obtain your API key.
Step 2: Set Environment Variable
export E2B_API_KEY =< YOUR-KEY >
harbor run --dataset terminal-bench@2.0 \
--agent claude-code \
--model anthropic/claude-opus-4-1 \
--env e2b \
--n-concurrent 75
Network Isolation
E2B supports network isolation by default when specified in task config:
[ environment ]
allow_internet = false
Runloop
Runloop provides managed environments optimized for agent evaluation.
Setup
Step 1: Install Runloop SDK
pip install runloop-api-client
export RUNLOOP_API_KEY =< YOUR-KEY >
harbor run --dataset terminal-bench@2.0 \
--agent claude-code \
--model anthropic/claude-opus-4-1 \
--env runloop \
--n-concurrent 50
Google Kubernetes Engine (GKE)
For large-scale enterprise deployments, Harbor supports GKE.
Setup
Ensure you have a GKE cluster running and kubectl configured.
gcloud auth application-default login
harbor run --dataset terminal-bench@2.0 \
--agent claude-code \
--model anthropic/claude-opus-4-1 \
--env gke \
--n-concurrent 200
GKE support is in beta. Contact the Harbor team for production deployment guidance.
Choosing a Provider
Provider Best For GPU Support Max Concurrency Startup Time Daytona General purpose, Docker Compose No 100+ Fast Modal GPU workloads, ML tasks Yes 50+ Medium E2B Security-sensitive, isolated No 75+ Very Fast Runloop Managed environments No 50+ Fast GKE Enterprise scale Yes* 200+ Medium
*GKE GPU support requires cluster configuration
Cost Optimization
Use Snapshots
Pre-build environment snapshots to reduce startup time and costs:
# Create snapshot (Daytona)
daytona snapshot create my-task-snapshot --from-sandbox < sandbox-i d >
# Use snapshot in evaluation
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env daytona \
--ek snapshot_template_name="my-task-snapshot"
Optimize Concurrency
Higher concurrency completes faster but may hit rate limits:
# Conservative: fewer concurrent trials
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env daytona \
--n-concurrent 25
# Aggressive: more concurrent trials
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env daytona \
--n-concurrent 100
Use Auto-Delete
Ensure environments are deleted promptly to avoid idle charges:
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env daytona \
--ek auto_delete_interval_mins:int= 5
Troubleshooting
Rate Limits
If you hit provider rate limits, reduce concurrency:
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env daytona \
--n-concurrent 20 # Reduced from 100
Build Timeouts
Increase build timeout for complex Docker images:
harbor run -d terminal-bench@2.0 -a claude-code -m anthropic/claude-opus-4-1 \
--env modal \
--environment-build-timeout-multiplier 3.0
Network Issues
Check if your task requires internet access:
[ environment ]
allow_internet = true # Required for tasks that download dependencies
Best Practices
Test locally first : Run 1-2 tasks locally before scaling to cloud
Start with lower concurrency : Gradually increase to find optimal throughput
Use snapshots : Pre-build images to reduce startup time
Monitor costs : Track cloud provider spending
Set timeouts : Configure auto-stop and auto-delete to avoid idle charges
Choose the right provider : Match provider capabilities to your task requirements
Next Steps
Running Evaluations Learn the basics of running evaluations
Parallel Execution Optimize parallel execution strategies
Creating Tasks Build tasks optimized for cloud execution