Workers Reference¶

Loom ships with six ready-made LLM workers and a document extraction module. Use them directly, chain them into pipelines, or use them as templates for your own workers.

Quick Start¶

# Use a shipped worker with the Workshop test bench
uv run loom workshop --port 8080
# Navigate to Workers → summarizer → Test

# Or chain them into a pipeline interactively
uv run loom new pipeline

LLM Workers¶

summarizer¶

Compresses text into a structured summary with key points.

Field	Value
Config	`configs/workers/summarizer.yaml`
Tier	`local` (Ollama)
Timeout	30s

Input:

{
  "text": "The text to summarize...",
  "max_points": 5,
  "focus": "economic impact"
}

Output:

{
  "summary": "2-3 sentence overview",
  "key_points": ["point 1", "point 2"],
  "word_count_original": 1200,
  "word_count_summary": 80
}

max_points and focus are optional. Summary is at least 70% shorter than input.

classifier¶

Assigns text to one of provided categories with confidence scoring.

Field	Value
Config	`configs/workers/classifier.yaml`
Tier	`local` (Ollama)
Timeout	20s

Input:

{
  "text": "Article text...",
  "categories": ["politics", "economics", "sports", "technology"],
  "category_descriptions": {
    "politics": "Government policy, elections, diplomacy"
  }
}

Output:

{
  "category": "politics",
  "confidence": 0.87,
  "reasoning": "The text discusses parliamentary elections..."
}

Categories are passed at runtime — the worker is generic. category_descriptions is optional but improves accuracy. Requires at least 2 categories.

extractor¶

Pulls structured fields from unstructured text.

Field	Value
Config	`configs/workers/extractor.yaml`
Tier	`standard` (Claude Sonnet)
Timeout	45s

Input:

{
  "text": "Contract between Acme Corp and...",
  "fields": [
    {"name": "parties", "description": "Contracting parties", "type": "list", "required": true},
    {"name": "effective_date", "description": "Contract start date", "type": "date"},
    {"name": "value", "description": "Total contract value", "type": "number"}
  ]
}

Output:

{
  "extracted": {
    "parties": {"value": ["Acme Corp", "Widget Inc"], "source_quote": "between Acme Corp and Widget Inc"},
    "effective_date": {"value": "2026-01-15", "source_quote": "effective January 15, 2026"},
    "value": {"value": 50000, "source_quote": "$50,000 total"}
  },
  "missing_required": []
}

Supported types: string, number, date, list, boolean. Each extracted field includes the source quote from the text.

translator¶

Multi-language translation with automatic source language detection.

Field	Value
Config	`configs/workers/translator.yaml`
Tier	`local` (Ollama)
Timeout	60s

Input:

{
  "text": "متن فارسی برای ترجمه",
  "target_language": "English",
  "source_language": "Persian"
}

Output:

{
  "translated_text": "Persian text for translation",
  "source_language": "Persian",
  "target_language": "English",
  "confidence": 0.92
}

source_language is optional — auto-detected if omitted. Preserves paragraph structure and proper nouns. If text is already in the target language, returns it unchanged with confidence 1.0.

qa¶

Question answering over provided context with source citations. Designed for RAG pipelines: retrieve chunks via vector search, pass them as context.

Field	Value
Config	`configs/workers/qa.yaml`
Tier	`local` (Ollama)
Timeout	45s

Input:

{
  "question": "What was the magnitude of the earthquake?",
  "context": "A 6.2 magnitude earthquake struck southeastern Iran on...",
  "answer_style": "concise"
}

Output:

{
  "answer": "The earthquake was magnitude 6.2.",
  "confidence": 1.0,
  "source_quotes": ["A 6.2 magnitude earthquake struck"],
  "answerable": true
}

Answers ONLY from provided context — no outside knowledge. Sets answerable to false when context is insufficient. source_quotes are exact substrings. answer_style options: concise (default), detailed, bullet_points.

RAG pipeline integration:

# 1. Search for relevant chunks
results=$(loom rag search "earthquake damage" --limit 5)

# 2. Pass results as context to QA worker (via Workshop test bench or pipeline)

reviewer¶

Quality review of content against configurable criteria. Generalized from the blind audit pattern used in production analytical pipelines.

Field	Value
Config	`configs/workers/reviewer.yaml`
Tier	`standard` (Claude Sonnet)
Timeout	90s

Input:

{
  "content": "Analysis text to review...",
  "criteria": ["accuracy", "completeness", "clarity", "bias"],
  "context": "This is a policy brief on energy subsidies",
  "severity_threshold": 0.3
}

Output:

{
  "overall_score": 0.78,
  "overall_pass": true,
  "scores": {
    "accuracy": {"score": 0.9, "assessment": "Claims are well-sourced"},
    "completeness": {"score": 0.6, "assessment": "Missing cost analysis"},
    "clarity": {"score": 0.85, "assessment": "Well-structured"},
    "bias": {"score": 0.75, "assessment": "Slight framing bias in section 3"}
  },
  "issues": [
    {
      "criterion": "completeness",
      "severity": 0.7,
      "description": "No cost-benefit analysis included",
      "suggestion": "Add estimated fiscal impact of subsidy changes",
      "quote": "subsidies should be reformed"
    }
  ],
  "strengths": ["Clear structure", "Good use of primary sources"]
}

criteria can be any evaluation dimensions — the reviewer adapts. context provides background on what the content should achieve. Only issues above severity_threshold are reported. Uses standard tier for stronger reasoning.

Document Processing (`contrib/docproc`)¶

Three extraction backends for PDF, DOCX, and other document formats. All produce the same output contract (ExtractorOutput), so downstream steps work unchanged regardless of which backend runs.

MarkItDownBackend¶

Fast, lightweight extraction via Microsoft MarkItDown. No ML models, no torch dependency. Best for well-structured digital documents.

# Worker config
processing_backend: "loom.contrib.docproc.markitdown_backend.MarkItDownBackend"

Supports: PDF, DOCX, PPTX, XLSX, HTML, plain text. Cannot: OCR scanned PDFs or extract complex table structures.

DoclingBackend¶

Deep extraction via IBM Docling with OCR, table structure recognition, and layout analysis. Requires torch.

processing_backend: "loom.contrib.docproc.docling_backend.DoclingBackend"

Supports: Scanned PDFs, complex layouts, multi-column documents. Config options: device (mps/cpu/cuda), ocr_engine (ocrmac/easyocr/tesseract), num_threads, layout_batch_size, ocr_batch_size.

SmartExtractorBackend (recommended)¶

Composite: tries MarkItDown first, falls back to Docling when needed. Optimizes for speed without sacrificing accuracy on difficult documents.

processing_backend: "loom.contrib.docproc.smart_extractor.SmartExtractorBackend"

Fallback triggers:

MarkItDown produces less than 50 characters (likely a scanned document)
MarkItDown raises an error
File extension is in force_docling_extensions list

Reports model_used: "markitdown" or "docling" so you know which path ran.

Extraction output¶

All backends produce:

{
  "file_ref": "document_extracted.json",
  "page_count": 12,
  "has_tables": true,
  "sections": ["Introduction", "Methods", "Results"],
  "text_preview": "First ~500 words..."
}

Full extracted text is written to the workspace directory (not passed through messages). Downstream steps access it via file_ref.

Example Pipelines¶

Translate → Summarize¶

pipeline_stages:
  - name: "translate"
    worker_type: "translator"
    input_mapping:
      text: "goal.context.text"
      target_language: "'English'"

  - name: "summarize"
    worker_type: "summarizer"
    input_mapping:
      text: "translate.output.translated_text"

Extract → Review¶

pipeline_stages:
  - name: "extract"
    worker_type: "extractor"
    input_mapping:
      text: "goal.context.document_text"
      fields: "goal.context.extraction_fields"

  - name: "review"
    worker_type: "reviewer"
    input_mapping:
      content: "extract.output.extracted"
      criteria: "'[\"accuracy\", \"completeness\"]'"

Document Processing Pipeline¶

pipeline_stages:
  - name: "extract"
    worker_type: "doc_extractor"
    tier: "local"
    input_mapping:
      file_ref: "goal.context.file_ref"

  - name: "classify"
    worker_type: "classifier"
    input_mapping:
      text: "extract.output.text_preview"
      categories: "'[\"report\", \"invoice\", \"contract\", \"memo\", \"other\"]'"

  - name: "summarize"
    worker_type: "summarizer"
    input_mapping:
      text: "extract.output.text_preview"

Creating Custom Workers¶

Use loom new worker for interactive scaffolding, or write YAML manually. See Building Workflows for the full guide.

The existing workers in configs/workers/ serve as templates — copy one, modify the system prompt and schemas, and you have a new worker.

Workers Reference¶

Quick Start¶

LLM Workers¶

summarizer¶

classifier¶

extractor¶

translator¶

qa¶

reviewer¶

Document Processing (contrib/docproc)¶

MarkItDownBackend¶

DoclingBackend¶

SmartExtractorBackend (recommended)¶

Extraction output¶

Example Pipelines¶

Translate → Summarize¶

Extract → Review¶

Document Processing Pipeline¶

Creating Custom Workers¶

Document Processing (`contrib/docproc`)¶