Track B: Helm Deployment and Session Automation¶

Instruction document for Claude Code — v0.3.0 baseline

This document describes the work to be done across the loom, baft, and framework repositories. It is written as a specification for Claude Code to execute, not as end-user documentation. Read it fully before starting any implementation.

Current state (as of 2026-03-23)¶

Repository versions¶

Repo	Version	Tag	CI	Notes
loom	0.8.0	v0.8.0	Green	MkDocs docs, 90% coverage, 1472 tests
baft	0.3.0	v0.3.0	Green	356 tests, contracts package, DeepEval
docman	0.5.0	v0.5.0	Green	MarkItDown + Docling backends
framework	unversioned	none	Green (validate.yml)	YAML database, 22 modules, 17 briefs

Existing infrastructure¶

loom/docker-compose.yml — NATS + Valkey + Workshop + Router
loom/docker/ — 4 Dockerfiles (orchestrator, workshop, router, worker)
loom/k8s/ — 8 Kustomize manifests (namespace, NATS, Redis, orchestrator, router, worker, workshop, kustomization.yaml)
baft/docs/SETUP.md — Complete local installation guide (12 steps)
baft/scripts/run_workers.sh — Worker launcher script
framework/scripts/setup.sh — Framework env setup
framework/scripts/watch_session_log.sh — File watcher for Chat integration

What does NOT exist yet¶

No Helm chart anywhere
No session start/stop automation (manual git pull, manual DuckDB import)
No framework sync checking during sessions
No Claude project file for Chat-based session management
No environment prerequisite checker beyond loom preflight

Part 1: Helm chart for local/cluster deployment¶

Goal¶

helm install baft ./charts/baft deploys the entire ITP analytical system including framework git sync, DuckDB import, NATS, workers, router, pipelines, Workshop, and MCP gateway.

Where to create it¶

Create charts/baft/ in the baft repository (not loom). Baft is the application — loom is the framework. The chart packages the baft application deployment.

Chart structure¶

charts/baft/
├── Chart.yaml
├── values.yaml
├── templates/
│   ├── _helpers.tpl
│   ├── namespace.yaml
│   ├── secrets.yaml
│   ├── configmap-env.yaml
│   │
│   ├── # Infrastructure
│   ├── nats-deployment.yaml
│   ├── nats-service.yaml
│   ├── valkey-deployment.yaml        # optional (redis checkpoint store)
│   ├── valkey-service.yaml
│   ├── valkey-pvc.yaml
│   │
│   ├── # Framework git sync
│   ├── framework-pvc.yaml            # PersistentVolumeClaim for framework clone
│   ├── framework-sync-deployment.yaml # git-sync + commit-agent sidecars
│   │
│   ├── # DuckDB
│   ├── duckdb-pvc.yaml               # Persistent DuckDB storage
│   ├── duckdb-import-job.yaml         # Initial full import
│   ├── duckdb-import-cronjob.yaml     # Periodic incremental import
│   │
│   ├── # Ollama (optional subchart or external)
│   ├── ollama-deployment.yaml
│   ├── ollama-service.yaml
│   │
│   ├── # Core actors
│   ├── router-deployment.yaml
│   ├── worker-deployment.yaml         # One Deployment per worker (13 total)
│   ├── pipeline-deployment.yaml       # 3 pipeline orchestrators
│   ├── scheduler-deployment.yaml
│   │
│   ├── # User-facing services
│   ├── workshop-deployment.yaml
│   ├── workshop-service.yaml
│   ├── mcp-deployment.yaml
│   ├── mcp-service.yaml
│   │
│   ├── # Optional: observability
│   ├── jaeger-deployment.yaml
│   ├── jaeger-service.yaml
│   │
│   └── # Ingress (optional)
│       └── ingress.yaml
└── README.md                          # Helm chart usage documentation

values.yaml design¶

# -- Global settings
namespace: baft
image:
  registry: ghcr.io/irantransitionproject
  tag: v0.3.0
  pullPolicy: IfNotPresent

# -- Framework git sync
framework:
  repo: "https://github.com/IranTransitionProject/framework.git"
  branch: main
  # For SSH: use "git@github.com:IranTransitionProject/framework.git"
  # and set framework.sshKeySecret
  sshKeySecret: ""              # K8s Secret name containing id_rsa
  gitTokenSecret: ""            # K8s Secret name containing GITHUB_TOKEN
  syncInterval: 60              # Seconds between git pull
  commitAgent:
    enabled: true
    interval: 900               # Seconds between commit+push (15 min)
    message: "Auto-commit: analytical session updates"

# -- DuckDB import
duckdb:
  importSchedule: "*/30 * * * *"  # Incremental import every 30 min
  storage: 5Gi

# -- LLM backends
anthropic:
  apiKeySecret: baft-api-keys     # K8s Secret name
  apiKeyField: ANTHROPIC_API_KEY

ollama:
  enabled: true
  model: "llama3.2:3b"
  image: ollama/ollama:latest
  gpu:
    enabled: false
    type: nvidia                  # nvidia or amd
    count: 1
  storage: 10Gi                   # Model storage PVC
  # External Ollama (when enabled: false)
  externalUrl: ""                 # e.g. "http://ollama.internal:11434"

# -- NATS
nats:
  image: nats:2.10-alpine
  monitoring: true                # Enable HTTP monitoring on :8222

# -- Valkey (Redis-compatible checkpoint store)
valkey:
  enabled: true
  image: valkey/valkey:8-alpine
  storage: 1Gi

# -- Workers
workers:
  # Default resource limits (override per-worker below)
  resources:
    requests:
      memory: 128Mi
      cpu: 100m
    limits:
      memory: 512Mi
      cpu: 500m
  # Per-worker settings
  sp: { replicas: 1 }
  ia: { replicas: 1 }             # Frontier tier — expensive
  de: { replicas: 1 }             # MUST be 1 (serialize_writes)
  xv: { replicas: 1 }
  in: { replicas: 1 }
  tn: { replicas: 1 }
  la: { replicas: 1 }
  pa: { replicas: 1 }
  rt: { replicas: 1 }             # Frontier tier — expensive
  as: { replicas: 1 }
  sa: { replicas: 1 }
  wt: { replicas: 1 }
  ni: { replicas: 1 }

# -- Router
router:
  replicas: 1

# -- Pipeline orchestrators
pipelines:
  quick: { enabled: true, replicas: 1 }
  standard: { enabled: true, replicas: 1 }
  audit: { enabled: true, replicas: 1 }

# -- Scheduler
scheduler:
  enabled: true

# -- Workshop UI
workshop:
  enabled: true
  replicas: 1
  service:
    type: NodePort
    port: 8080
    nodePort: 30080
  ingress:
    enabled: false
    host: workshop.local
    tls: false

# -- MCP Gateway
mcp:
  enabled: true
  transport: streamable-http
  port: 8765
  service:
    type: ClusterIP

# -- Observability
jaeger:
  enabled: false
  image: jaegertracing/jaeger:latest

# -- Environment (injected into all pods)
env:
  ITP_ROOT: /data/framework
  BAFT_WORKSPACE: /data/workspace
  NATS_URL: nats://nats:4222

Framework sync architecture¶

Use the official registry.k8s.io/git-sync/git-sync:v4 container image.

Deployment: framework-sync

Pod:
  initContainer: git-sync (one-shot clone)
  containers:
    - git-sync (continuous pull every syncInterval seconds)
    - commit-agent (cron loop: git add -A && git diff --cached --quiet || git commit && git push)
  volumes:
    - framework-data PVC (ReadWriteOnce)

Workers mount the framework PVC as read-only via subPath. The DE worker and DuckDB import job also mount it read-only — they write to the DuckDB PVC, not to the framework volume.

The commit-agent sidecar is a minimal shell script:

#!/bin/sh
while true; do
  sleep $COMMIT_INTERVAL
  cd /data/framework
  git add -A
  if ! git diff --cached --quiet; then
    git commit -m "$COMMIT_MESSAGE — $(date -u +%Y-%m-%dT%H:%M:%SZ)"
    git push || echo "Push failed — will retry next cycle"
  fi
done

Mount git credentials from the Secret specified in framework.sshKeySecret or framework.gitTokenSecret.

Container images needed¶

The existing Dockerfiles in loom/docker/ need to be adapted for baft. Create new Dockerfiles in baft/docker/:

baft/docker/
├── Dockerfile.worker       # Base worker image (loom + baft installed)
├── Dockerfile.router       # Router image
├── Dockerfile.pipeline     # Pipeline orchestrator image
├── Dockerfile.workshop     # Workshop UI image
├── Dockerfile.mcp          # MCP gateway image
├── Dockerfile.import       # DuckDB import job image
└── Dockerfile.commit-agent # Git commit sidecar (alpine + git)

All application images should:

Start from python:3.12-slim
Install uv
Copy loom source and install it
Copy baft source and install it
Copy configs/ directory
Set ENTRYPOINT to the appropriate loom CLI command

The commit-agent image is just alpine/git with the shell script above.

Implementation order¶

Create charts/baft/Chart.yaml and values.yaml
Create templates/_helpers.tpl with standard label/selector helpers
Create infrastructure templates (NATS, Valkey, secrets, configmap)
Create framework-sync templates (PVC, deployment with git-sync + commit-agent)
Create DuckDB templates (PVC, import job, import cronjob)
Create worker templates (one Deployment per worker, parameterized from values)
Create router, pipeline, scheduler templates
Create Workshop and MCP templates
Create optional Ollama and Jaeger templates
Create Dockerfiles in baft/docker/
Test with helm template (dry-run validation)
Test with helm install --dry-run
Document in charts/baft/README.md

Key constraints¶

DE must be replicas: 1 — serialize_writes is a hard invariant
Framework PVC must be ReadWriteOnce — only the framework-sync pod writes
Workers mount framework as read-only — via volumeMount.readOnly: true
DuckDB PVC is ReadWriteOnce — only DE and import jobs write to it
Ollama needs GPU scheduling if gpu.enabled: true — use nvidia.com/gpu or amd.com/gpu resource requests
Secrets must never be in values.yaml — always reference K8s Secrets
All images tagged with baft version — not latest

Part 2: Session automation¶

Goal¶

Automate the repetitive parts of starting and ending analytical sessions:

Session start: pull framework, check for upstream changes, incremental DuckDB import, verify services, register session
Session end: commit framework changes, push, unregister session
During session: periodically check if framework remote has new commits (someone else pushed), warn the analyst

Implementation: `loom session` CLI commands¶

Add to baft/src/baft/cli.py (new file) or extend loom's CLI via a plugin.

Preferred approach: Create a baft CLI that wraps loom commands and adds session automation. This keeps baft-specific logic out of loom.

baft session start [--session-id NAME]
baft session end [--message "description"]
baft session status
baft session sync-check

`baft session start`¶

cd $ITP_ROOT/framework && git pull --ff-only
If pull fails (merge conflict), warn and abort
cd $ITP_ROOT/baft && uv run python pipeline/scripts/itp_import_to_duckdb.py --incremental
Run loom preflight equivalent checks:
NATS reachable at $NATS_URL
Ollama reachable at $OLLAMA_URL
ANTHROPIC_API_KEY is set and non-empty
DuckDB file exists at $BAFT_WORKSPACE/itp.duckdb
Framework directory exists at $ITP_ROOT/framework/data/
Register session: baft.sessions.register_session(session_id)
Print summary: "Session started. Framework at commit [hash]. DuckDB updated. All services reachable."

`baft session end`¶

Unregister session: baft.sessions.unregister_session(session_id)
cd $ITP_ROOT/framework
git add -A
If there are changes:
git commit -m "Session [session_id]: [message] — [date]"
git push
Print: "Framework changes committed and pushed."
If no changes:
Print: "No framework changes to commit."

`baft session status`¶

Show active sessions from baft.sessions.get_active_sessions()
Show framework git status (clean/dirty, current commit, behind remote?)
Show service health (NATS, Ollama, DuckDB file age)

`baft session sync-check`¶

cd $ITP_ROOT/framework && git fetch origin --quiet
Compare HEAD with origin/main:
If behind: warn "Framework has N new commits from remote. Run baft session sync to pull and re-import."
If ahead: info "You have N local commits not yet pushed."
If diverged: warn "Framework has diverged from remote. Manual resolution needed."
If up-to-date: "Framework is current."

This command should be callable on a schedule (see Part 3 below).

`baft session sync`¶

git pull --ff-only (abort if conflict)
uv run python pipeline/scripts/itp_import_to_duckdb.py --incremental
Print: "Synced to [commit]. DuckDB updated."

Implementation details¶

Create these files:

baft/src/baft/cli.py          # Click CLI group with session commands
baft/pyproject.toml            # Add [project.scripts] baft = "baft.cli:main"
baft/tests/test_session_cli.py # Tests for session commands

The CLI should use Click (already a loom dependency). Each command should be testable with click.testing.CliRunner.

Environment checks should use loom.cli.preflight internals where possible rather than reimplementing them.

Git operations should use subprocess.run(["git", ...]) with proper error handling — do NOT use gitpython or any git library. Keep it simple.

Part 3: Claude Chat project setup for session management¶

Goal¶

When an analyst opens a Claude Chat project that points at the ITP repos, Claude should be able to:

Guide them through session start (or do it automatically via MCP)
Periodically check for framework sync during the session
Help them commit and push at session end

Claude project file¶

Create baft/.claude/project.md (or wherever Claude projects read from) with instructions that tell Claude Chat how to manage sessions.

Important: Claude Chat does NOT have direct CLI access. It works through MCP tools. The session automation must be exposed as MCP tools so Chat can invoke them.

New MCP tools for session management¶

Add these tools to the baft MCP gateway config (configs/mcp/itp.yaml):

tools:
  workshop:
    # ... existing workshop tools ...
  session:
    enable: [start, end, status, sync_check, sync]

And implement them in loom's MCP workshop bridge or as a new session bridge:

session.start       — Pull framework, import DuckDB, check services, register
session.end         — Commit framework, push, unregister
session.status      — Active sessions, git status, service health
session.sync_check  — Check if remote framework has new commits
session.sync        — Pull framework + incremental DuckDB import

These tools call the same logic as the baft session CLI commands.

Claude Chat instructions¶

Create baft/docs/CLAUDE_CHAT_SESSION_INSTRUCTIONS.md:

# ITP Session Management — Instructions for Claude

You are assisting an ITP analyst. The analytical system runs on the Loom
framework with 13 specialized workers connected via NATS.

## At session start

When the analyst starts a new session or says they want to begin work:

1. Call `session.start` to initialize the session
2. If any checks fail, report them clearly and suggest fixes
3. Confirm: "Session [id] is active. Framework is at [commit].
   All services are operational."

## During the session

Every 15 minutes (or when the analyst asks), call `session.sync_check`:

- If the framework has new remote commits, tell the analyst:
  "The framework has been updated by another session. Would you like
  me to sync now, or continue with the current version?"
- If they say yes, call `session.sync`

## At session end

When the analyst says they're done or wants to end the session:

1. Ask: "Would you like me to commit the framework changes from this
   session? If so, please provide a brief description."
2. Call `session.end` with their message
3. Confirm the commit hash and that push succeeded

## Prerequisites to verify

Before any session operations, verify:

- The MCP connection to the baft server is active (you can call tools)
- The `session.*` tools are available in the tool list
- If tools are missing, tell the analyst: "The session management tools
  are not available. Please ensure the MCP server is running with
  `uv run loom mcp --config configs/mcp/itp.yaml`"

## Error handling

- If `session.start` reports NATS is unreachable: "NATS is not running.
  Start it with: `docker start nats-itp`"
- If `session.start` reports Ollama is unreachable: "Ollama is not
  running. Start it with: `ollama serve`"
- If `session.end` push fails: "Push failed — this usually means
  the remote has new commits. Let me check..." then call sync_check
- If framework has diverged: "The framework has diverged from remote.
  This needs manual resolution. Open a terminal and run
  `cd $ITP_ROOT/framework && git status` to see the conflict."

Claude project configuration¶

The analyst's Claude project should have:

Project instructions that point to the session management doc
MCP server connection to the baft MCP gateway
Knowledge files (optional): ANALYST_GUIDE.md, SETUP.md

The project instructions file should be minimal — just bootstrap to the full instructions:

# ITP Analytical Engine

This project connects to the ITP analytical system via MCP.

## Setup

Ensure the MCP server is running:
`uv run loom mcp --config configs/mcp/itp.yaml --transport streamable-http --port 8765`

## Session management

Follow the instructions in `docs/CLAUDE_CHAT_SESSION_INSTRUCTIONS.md`
for session start, sync checking, and session end procedures.

## Available tools

- `itp.*` — Analytical pipeline tools (process sources, analyze, validate, update)
- `workshop.*` — Worker testing, evaluation, config management
- `session.*` — Session lifecycle (start, end, status, sync)
- DuckDB query tools — Search and filter the analytical database

Part 4: Environment prerequisite checker¶

Goal¶

A single command that validates the entire environment is ready for analytical work, with clear fix instructions for each failure.

Implementation¶

Extend baft session start (and the session.start MCP tool) to run a comprehensive check. But also make it available standalone:

baft preflight

Checks (in order):

Python version — >= 3.11
uv installed — which uv
Repos present — $ITP_ROOT/framework, $ITP_ROOT/loom, $ITP_ROOT/baft exist
Dependencies installed — uv run python -c "import loom; import baft"
Environment variables — ITP_ROOT, BAFT_WORKSPACE, ANTHROPIC_API_KEY set
NATS reachable — TCP connect to $NATS_URL
Ollama reachable — HTTP GET $OLLAMA_URL/api/tags
Ollama model present — Expected model in tag list
DuckDB file exists — $BAFT_WORKSPACE/itp.duckdb
DuckDB not stale — mtime < 24 hours (warn if older)
Framework git clean — no uncommitted changes (warn only)
Framework not behind remote — git fetch && git rev-list --count HEAD..origin/main

Output format:

ITP Preflight Check
───────────────────
[OK] Python 3.12.4
[OK] uv 0.6.3
[OK] Repos: framework, loom, baft
[OK] Dependencies installed
[OK] Environment variables set
[OK] NATS reachable (localhost:4222)
[OK] Ollama reachable (localhost:11434)
[OK] Ollama model: llama3.2:3b
[OK] DuckDB exists (14.2 MB, updated 2h ago)
[WARN] Framework has uncommitted changes (3 files)
[OK] Framework up-to-date with remote

11/12 checks passed, 1 warning

Implementation sequence¶

Phase 1: Session CLI + preflight (do first)¶

Create baft/src/baft/cli.py with Click commands
Add [project.scripts] entry in pyproject.toml
Implement baft preflight
Implement baft session start/end/status/sync-check/sync
Write tests with CliRunner
Update CLAUDE.md

Phase 2: Session MCP tools¶

Create loom/src/loom/mcp/session_bridge.py (or extend workshop_bridge)
Add session tool discovery to workshop_discovery.py
Wire into MCP server dispatch
Write tests
Update configs/mcp/itp.yaml

Phase 3: Claude Chat project setup¶

Create baft/docs/CLAUDE_CHAT_SESSION_INSTRUCTIONS.md
Create project instructions file
Test with Claude Chat + MCP connection
Document in SETUP.md

Phase 4: Helm chart¶

Create charts/baft/ structure
Implement templates (infrastructure → framework-sync → workers → services)
Create Dockerfiles in baft/docker/
Test with helm template dry-run
Test with local k8s (minikube or Docker Desktop Kubernetes)
Document in charts/baft/README.md
Add CI step to lint the Helm chart (helm lint)

Phase 5: Container images + CI¶

Set up GitHub Container Registry (ghcr.io) publishing
Add GitHub Actions workflow for building and pushing images
Tag images with baft version
Update Helm chart to reference published images

Testing strategy¶

Session CLI tests¶

Test with click.testing.CliRunner. Mock:

subprocess.run for git commands
baft.sessions for register/unregister
Network checks (NATS, Ollama) with socket mocks

Session MCP tool tests¶

Follow the pattern in tests/test_mcp_workshop_bridge.py:

Mock the session bridge components
Test each tool action
Test error paths (NATS down, git conflict, etc.)

Helm chart tests¶

helm lint charts/baft/
helm template baft charts/baft/ -f values.yaml (dry-run)
Validate generated YAML with kubeval or kubeconform
Optionally: helm test with a simple connectivity check

Files to create (summary)¶

In baft/¶

src/baft/cli.py                                  # Session CLI commands
docker/Dockerfile.worker                          # Worker container image
docker/Dockerfile.router                          # Router container image
docker/Dockerfile.pipeline                        # Pipeline orchestrator image
docker/Dockerfile.workshop                        # Workshop UI image
docker/Dockerfile.mcp                             # MCP gateway image
docker/Dockerfile.import                          # DuckDB import job image
docker/Dockerfile.commit-agent                    # Git commit sidecar
charts/baft/Chart.yaml                            # Helm chart metadata
charts/baft/values.yaml                           # Default configuration
charts/baft/README.md                             # Helm usage guide
charts/baft/templates/_helpers.tpl                # Template helpers
charts/baft/templates/*.yaml                      # ~20 template files
docs/CLAUDE_CHAT_SESSION_INSTRUCTIONS.md          # Chat session management guide
tests/test_session_cli.py                         # CLI tests

In loom/¶

src/loom/mcp/session_bridge.py                    # Session MCP tool dispatch

Files to modify¶

baft/pyproject.toml          # Add [project.scripts] baft = "baft.cli:main"
baft/CLAUDE.md               # Document session CLI, Helm chart
baft/docs/SETUP.md           # Add session automation section
loom/src/loom/mcp/workshop_discovery.py  # Add session tool definitions
loom/src/loom/mcp/server.py              # Wire session bridge
baft/configs/mcp/itp.yaml               # Add session tool group

Key decisions already made¶

Framework is a live git clone — not a static artifact. Analysts commit to it regularly. This is the source of truth for all analytical data.
Dead-letter MCP tools are opt-in — The in-memory consumer is not wired to live NATS in the MCP path. Tools exist but only operate on locally stored entries. Explicitly documented as a limitation.
DE must always be replicas: 1 — serialize_writes is a design invariant. This constraint must be enforced in the Helm chart values validation.
Audit independence is config-enforced — LA, PA, RT, AS, TN, SA have restricted knowledge_sources. The Helm chart must mount framework data read-only and must NOT change silo mappings.
Session management goes through MCP — Claude Chat cannot run CLI commands directly. All session operations must be available as MCP tools for Chat to invoke them.
Baft CLI wraps loom — Session commands are baft-specific, not framework-level. The baft CLI is a thin wrapper that calls loom internals + baft session logic.

Track B: Helm Deployment and Session Automation¶

Current state (as of 2026-03-23)¶

Repository versions¶

Existing infrastructure¶

What does NOT exist yet¶

Part 1: Helm chart for local/cluster deployment¶

Goal¶

Where to create it¶

Chart structure¶

values.yaml design¶

Framework sync architecture¶

Container images needed¶

Implementation order¶

Key constraints¶

Part 2: Session automation¶

Goal¶

Implementation: loom session CLI commands¶

baft session start¶

baft session end¶

baft session status¶

baft session sync-check¶

baft session sync¶

Implementation details¶

Part 3: Claude Chat project setup for session management¶

Goal¶

Claude project file¶

New MCP tools for session management¶

Claude Chat instructions¶

Claude project configuration¶

Part 4: Environment prerequisite checker¶

Goal¶

Implementation¶

Implementation sequence¶

Phase 1: Session CLI + preflight (do first)¶

Phase 2: Session MCP tools¶

Phase 3: Claude Chat project setup¶

Phase 4: Helm chart¶

Phase 5: Container images + CI¶

Testing strategy¶

Session CLI tests¶

Session MCP tool tests¶

Helm chart tests¶

Files to create (summary)¶

In baft/¶

In loom/¶

Files to modify¶

Key decisions already made¶

Implementation: `loom session` CLI commands¶

`baft session start`¶

`baft session end`¶

`baft session status`¶

`baft session sync-check`¶

`baft session sync`¶