Troubleshooting¶

Common issues and solutions when running Loom.

Setup & Configuration¶

`loom setup` can't detect Ollama¶

Symptom: Setup wizard reports "Ollama not detected" even though Ollama is running.

Fix:

Check Ollama is running: curl http://localhost:11434/api/tags
If Ollama is on a different port or host, enter the URL when prompted
Check firewall isn't blocking port 11434
If using Docker Ollama: docker run -p 11434:11434 ollama/ollama

`loom setup` Anthropic key validation fails¶

Symptom: Setup reports "Key validation failed" after entering an API key.

Fix:

Double-check the key starts with sk-ant-
Verify network connectivity to api.anthropic.com
The key is saved anyway — validation is best-effort
Test manually: curl -H "x-api-key: sk-ant-..." https://api.anthropic.com/v1/models

Config file not picked up¶

Symptom: Settings from ~/.loom/config.yaml don't take effect.

Fix:

Check the file exists: cat ~/.loom/config.yaml
Env vars override config file values — check for conflicting OLLAMA_URL, ANTHROPIC_API_KEY
Priority: CLI flags > env vars > config.yaml > defaults
See Configuration for the full priority chain

RAG Pipeline¶

`loom rag ingest` fails with "No valid exports found"¶

Symptom: Ingest exits immediately without processing any files.

Fix:

Verify files exist: ls /path/to/exports/result*.json
Telegram exports must be JSON format (not HTML)
Use Telegram Desktop → Export Chat → JSON format
File paths are passed as arguments: loom rag ingest file1.json file2.json

Embedding fails during `loom rag ingest`¶

Symptom: Ingest hangs or errors at the "Storing chunks" step.

Fix:

Check Ollama is running: curl http://localhost:11434/api/tags
Check embedding model is installed: ollama list | grep nomic-embed-text
Pull the model: ollama pull nomic-embed-text
Use --no-embed to skip embeddings: loom rag ingest --no-embed files...
Check Ollama URL in config: cat ~/.loom/config.yaml | grep ollama_url

`loom rag search` returns no results¶

Symptom: Search returns "No results found" even after ingesting data.

Fix:

Check the store has data: loom rag stats
If you ingested with --no-embed, search won't work (embeddings required)
Re-ingest with embeddings: loom rag ingest --embed files...
Lower the score threshold: loom rag search "query" --min-score 0.0
Check you're using the same store path: loom rag --db-path /path/to/store.duckdb search "query"

LanceDB import errors¶

Symptom: ImportError: No module named 'lancedb' when using --store lancedb.

Fix:

uv sync --extra lancedb

NATS Connection¶

Cannot connect to NATS¶

Symptom: Actor exits immediately with bus.connected never appearing in logs, or error Could not connect to server.

Fix:

# Check if NATS is running
nats-server --version  # Should print version
curl -s http://localhost:8222/varz | head -5  # NATS monitoring endpoint

# Start NATS via Docker (quickest)
docker run -d --name nats -p 4222:4222 -p 8222:8222 nats:latest

# Or via Homebrew (macOS)
brew install nats-server
nats-server &

# Or via Docker Compose (full stack)
docker compose up -d

NATS connection drops intermittently¶

Symptom: Log shows bus.disconnected followed by bus.reconnected (or actor crash after 60s of retries).

Fix:

Check NATS server resource usage (nats-server memory, disk, connections)
Increase NATS max payload if sending large messages: nats-server --max_payload 4MB
If behind a load balancer, ensure idle timeout exceeds NATS ping interval (default 2 min)
Check network stability between client and NATS server

Messages silently dropped¶

Symptom: Tasks published but no worker picks them up. No errors in logs.

Cause: NATS uses at-most-once delivery. If no subscriber is listening when a message is published, it is silently dropped.

Fix:

Ensure workers are running before publishing tasks
Start actors in the right order: workers → router → orchestrator/pipeline
Check that worker_type in the task matches the worker's subscription (case-sensitive)
Check loom.tasks.dead_letter for unroutable tasks: loom dead-letter monitor

Workers¶

Worker produces empty or invalid output¶

Symptom: Worker completes but output doesn't match output_schema. Downstream stages fail with validation errors.

Fix:

Check the worker's system prompt — it must instruct the LLM to output valid JSON matching the schema
Use the Workshop test bench to test the worker in isolation: loom workshop --port 8080
Enable trace logging to see full I/O: LOOM_TRACE=1 loom worker --config ...
Verify the LLM backend is responding correctly (try a direct API call)

ANTHROPIC_API_KEY not set¶

Symptom: Workers using standard or frontier tier fail with authentication errors.

Fix:

export ANTHROPIC_API_KEY=sk-ant-...
# Or add to shell profile:
echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.zshrc

OLLAMA_URL not set / Ollama not running¶

Symptom: Workers using local tier fail to connect.

Fix:

# Install and start Ollama
brew install ollama  # macOS
ollama serve &

# Set URL (default is http://localhost:11434)
export OLLAMA_URL=http://localhost:11434

# Pull a model
ollama pull llama3.2

Worker hangs or times out¶

Symptom: Worker never completes. Pipeline shows PipelineTimeoutError.

Fix:

Check LLM backend is responsive (try a direct API call)
Increase timeout_seconds in the stage config if the task is legitimately slow
For Ollama, check if the model is still loading (ollama ps)
Check if the worker is stuck in a tool-use loop (max 10 rounds by default)

Pipelines¶

PipelineMappingError: key not found in context¶

Symptom: Stage 'X' mapping error: Path 'Y.output.Z': key 'Z' not found in context

Cause: A stage's input_mapping references a field that the previous stage didn't produce.

Fix:

Check the upstream stage's output_schema — does it include the field?
Test the upstream worker in Workshop to see its actual output
If the field is optional, add a condition to skip the downstream stage when it's missing

PipelineValidationError: input/output validation failed¶

Symptom: Stage fails before or after execution with schema validation errors.

Fix:

Check input_schema / output_schema in the stage config
Use Workshop test bench to verify the worker's actual output format
Common issue: schema says "type": "integer" but worker outputs a string number

Circular dependency detected¶

Symptom: Pipeline fails to start with ValueError: Circular dependency detected among stages.

Fix:

Check input_mapping paths — stage A referencing stage B and B referencing A creates a cycle
Use depends_on to override automatic dependency inference if needed
Visualize the dependency graph in Workshop's pipeline editor

Router¶

Tasks going to dead letter¶

Symptom: Tasks appear in loom.tasks.dead_letter instead of reaching workers.

Cause: Router can't find a matching route for the worker_type + model_tier combination.

Fix:

Check configs/router_rules.yaml for tier overrides
Verify the worker_type in the task matches a running worker's config name
Check rate limits — rate-limited tasks may be dead-lettered
Monitor dead letters: loom dead-letter monitor --nats-url nats://localhost:4222

Workshop¶

Workshop won't start¶

Symptom: loom workshop fails with import errors.

Fix:

# Install workshop dependencies
uv sync --extra workshop

# Or all extras
uv sync --all-extras

App deployment fails¶

Symptom: ZIP upload returns error during app deployment.

Fix:

Verify ZIP contains manifest.yaml at the root (not in a subdirectory)
Check manifest fields: name, version, description are required
Ensure all config files referenced in entry_configs exist in the ZIP
ZIP must not contain symlinks or paths with ..
Build the ZIP using the app's scripts/build-app.sh for correct structure

Docker / Kubernetes¶

Container can't reach NATS¶

Symptom: Containers fail to connect to nats://nats:4222.

Fix:

In Docker Compose: services use the service name as hostname (nats)
Standalone Docker: use --network host or link containers
In Kubernetes: verify the NATS service is in the same namespace
Check: docker exec <container> nslookup nats

Workshop not accessible from host¶

Symptom: Workshop runs but browser can't reach it.

Fix:

Bind to 0.0.0.0 not 127.0.0.1: loom workshop --host 0.0.0.0 --port 8080
Docker: expose the port: -p 8080:8080
Kubernetes: use NodePort (30080) or port-forward: kubectl port-forward svc/loom-workshop 8080:8080

macOS Service (launchd)¶

Services not starting after install¶

Fix:

# Check service status
launchctl list | grep loom

# Check logs
cat ~/Library/Logs/loom/workshop.err
cat ~/Library/Logs/loom/router.err

# Reload services
launchctl unload ~/Library/LaunchAgents/com.loom.workshop.plist
launchctl load ~/Library/LaunchAgents/com.loom.workshop.plist

Permission denied¶

Fix:

launchd user agents don't need sudo — run as your user
If loom binary is in a restricted path, move it or adjust the plist

Windows Service (NSSM)¶

Services not starting¶

Fix:

# Check service status
nssm status LoomWorkshop
nssm status LoomRouter

# Check logs
Get-Content "$env:LOCALAPPDATA\loom\logs\workshop.err"

# Restart
nssm restart LoomWorkshop

NSSM not found¶

Fix:

# Install via Chocolatey
choco install nssm

# Or download from https://nssm.cc/download

Performance¶

Pipeline is slow¶

Fix:

Design stages with independent dependencies so they run in parallel
Scale workers horizontally via NATS queue groups (run multiple instances)
Set max_concurrent_goals in pipeline config for concurrent goal processing
Check token usage logs (worker.llm_usage) for expensive stages

High memory usage¶

Fix:

Workers are stateless and reset() between tasks — check for leaked references
DuckDB stores can grow large — monitor disk usage
Dead-letter consumer has a bounded store (default 1000 entries) — adjust max_size if needed
Valkey checkpoint store: check TTL settings for expired entries

Workshop Evaluation¶

LLM judge gives inconsistent scores¶

Symptom: Same test case produces different scores across eval runs.

Fix:

Set temperature=0.0 for the judge backend (this is the default)
Use a more capable model for judging (the Workshop prefers the standard tier)
Provide a more specific judge_prompt tailored to your domain
Check score_details.reasoning in eval results to understand scoring rationale

Baseline comparison shows no data¶

Symptom: Eval detail page shows no baseline comparison.

Fix:

Promote a successful eval run as baseline first: click "Promote as Baseline" on the eval detail page
Each worker has at most one baseline; promoting a new one replaces the previous
Baseline comparison is only shown when viewing a run that is not the baseline itself

Dead-letter replay keeps failing¶

Symptom: Replayed tasks end up back in the dead-letter queue.

Fix:

Check the original reason in the replay log — if "unroutable", ensure a worker for that worker_type + tier is running
If "rate_limited", wait for the rate limiter bucket to refill before replaying
If "malformed", the task data itself is invalid and needs to be fixed at the source

TUI Dashboard¶

TUI won't start¶

Symptom: loom ui fails with import errors.

Fix:

# Install TUI dependencies
uv sync --extra tui

# Or all extras
uv sync --all-extras

TUI shows "NATS connection failed"¶

Symptom: Dashboard starts but shows a red "disconnected" status and an error in the Events log.

Fix:

Check that NATS is running: nats-server --version or docker ps | grep nats
Verify the URL: loom ui --nats-url nats://localhost:4222 (default)
Check for firewall rules blocking port 4222
The TUI needs NATS running — it subscribes to loom.> to observe traffic

TUI shows no events¶

Symptom: Dashboard is connected (green status) but no goals, tasks, or events appear.

Fix:

The TUI is a passive observer — it only shows traffic that occurs while it's running
Submit a goal or run a pipeline to generate traffic
Check that actors (router, workers, orchestrator/pipeline) are running
The TUI subscribes to loom.> which catches all Loom NATS subjects

Pipeline stages not appearing¶

Symptom: Goals and tasks appear but the Pipeline tab is empty.

Fix:

Pipeline stage data comes from _timeline in result output — only pipeline orchestrators produce this
Dynamic orchestrators (OrchestratorActor) don't produce timeline data; use pipeline orchestrators for stage visibility
Check that the pipeline is producing results (look in the Events tab for loom.results.* messages)

Distributed Tracing (OpenTelemetry)¶

Tracing not producing spans¶

Symptom: No spans appear in your tracing backend (Jaeger, Zipkin, Tempo).

Fix:

# Install OTel dependencies
uv sync --extra otel

# Verify installation
python -c "from opentelemetry import trace; print('OTel available')"

Then initialize tracing at startup:

from loom.tracing import init_tracing
init_tracing(service_name="loom")

Or set the standard OTel environment variables:

export OTEL_SERVICE_NAME=loom
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

Spans not linking across actors¶

Symptom: You see separate root spans for each actor instead of a connected trace.

Fix:

Loom propagates trace context via a _trace_context key in NATS messages (W3C traceparent format)
Both the sender and receiver must have OTel installed for propagation to work
Check that inject_trace_context() and extract_trace_context() are being called (they are in BaseActor._process_one() and all publish methods)
If using custom actors, ensure you call inject_trace_context(data) before publishing and extract_trace_context(data) when receiving

OTel not installed but code imports it¶

Symptom: Worried about import errors when OTel is not installed.

Fix:

This is handled automatically. The loom.tracing module uses runtime feature detection — if OTel SDK is not installed, all functions become no-ops. No code changes needed. See Design Invariant #9.

LLM spans missing prompt/completion text¶

Symptom: LLM call spans appear in your tracing backend but contain no prompt or completion content.

Fix:

Set the LOOM_TRACE_CONTENT environment variable to enable prompt and completion recording:

export LOOM_TRACE_CONTENT=1

With this enabled, LLM call spans include two span events:

gen_ai.content.prompt — the full prompt sent to the model
gen_ai.content.completion — the model's response text

This is disabled by default to avoid storing sensitive data in your tracing backend.

Note: Even without LOOM_TRACE_CONTENT, all LLM call spans include these gen_ai.* attributes per the OpenTelemetry GenAI semantic conventions:

Attribute	Description
`gen_ai.system`	LLM provider (e.g., `anthropic`, `ollama`)
`gen_ai.request.model`	Model requested (e.g., `claude-sonnet-4-20250514`)
`gen_ai.response.model`	Model that served the request
`gen_ai.usage.input_tokens`	Prompt token count
`gen_ai.usage.output_tokens`	Completion token count
`gen_ai.request.temperature`	Sampling temperature
`gen_ai.request.max_tokens`	Max output tokens requested

Troubleshooting¶

Setup & Configuration¶

loom setup can't detect Ollama¶

loom setup Anthropic key validation fails¶

Config file not picked up¶

RAG Pipeline¶

loom rag ingest fails with "No valid exports found"¶

Embedding fails during loom rag ingest¶

loom rag search returns no results¶

LanceDB import errors¶

NATS Connection¶

Cannot connect to NATS¶

NATS connection drops intermittently¶

Messages silently dropped¶

Workers¶

Worker produces empty or invalid output¶

ANTHROPIC_API_KEY not set¶

OLLAMA_URL not set / Ollama not running¶

Worker hangs or times out¶

Pipelines¶

PipelineMappingError: key not found in context¶

PipelineValidationError: input/output validation failed¶

Circular dependency detected¶

Router¶

Tasks going to dead letter¶

Workshop¶

Workshop won't start¶

App deployment fails¶

Docker / Kubernetes¶

Container can't reach NATS¶

Workshop not accessible from host¶

macOS Service (launchd)¶

Services not starting after install¶

Permission denied¶

Windows Service (NSSM)¶

Services not starting¶

NSSM not found¶

Performance¶

Pipeline is slow¶

High memory usage¶

Workshop Evaluation¶

LLM judge gives inconsistent scores¶

Baseline comparison shows no data¶

Dead-letter replay keeps failing¶

TUI Dashboard¶

TUI won't start¶

TUI shows "NATS connection failed"¶

TUI shows no events¶

Pipeline stages not appearing¶

Distributed Tracing (OpenTelemetry)¶

Tracing not producing spans¶

Spans not linking across actors¶

OTel not installed but code imports it¶

LLM spans missing prompt/completion text¶

`loom setup` can't detect Ollama¶

`loom setup` Anthropic key validation fails¶

`loom rag ingest` fails with "No valid exports found"¶

Embedding fails during `loom rag ingest`¶

`loom rag search` returns no results¶