Analyst Guide — Working with Baft¶

Audience: ITP analysts using Baft through Claude Desktop, Claude Code, or the Workshop web UI. No programming knowledge required.

What Baft does for you¶

Baft is the engine behind your analytical workflow. When you chat with Claude and ask it to process a source, run an analysis, or update the database, Baft handles the structured work behind the scenes. It:

Extracts factual claims from source material (news articles, Telegram channels, reports)
Analyzes claims against the ITP analytical framework (variables, scenarios, traps, gaps)
Validates cross-references and consistency before writing changes
Persists validated results to the YAML database
Audits publication-bound analysis through blind review (three independent reviewers)
Monitors your session quality and cognitive load
Scans watch list items and narrative patterns daily

You interact with Baft through Claude. Claude sees Baft's capabilities as tools it can call on your behalf.

Your tools¶

When Claude is connected to Baft, it has access to these tools:

Direct worker tools¶

Tool	What it does	When to use
`process_sources`	Extracts structured claims from raw text	When you have new source material to process
`analyze_intelligence`	Produces analytical output (observations, variable assessments, scenario updates)	After source processing, or for standalone analysis
`update_database`	Writes validated changes to the YAML database	After analysis produces an integration spec
`validate_cross_refs`	Checks entity IDs, module codes, and relationship consistency	Before database commits
`submit_input`	Captures a quick note or observation for later processing	Time-sensitive findings that need immediate capture

Pipeline tools (recommended for most work)¶

Tool	Stages	When to use
`run_quick_pipeline`	XV validate -> DE write	Simple field updates, status changes, formatting fixes
`run_standard_pipeline`	SP -> IA -> XV -> DE	New source integration, variable updates, gap analysis
`run_audit_pipeline`	TN -> [LA + PA + RT] -> AS	Before publishing any brief or major thesis revision

Query tools¶

Tool	What it does
`itp_search`	Full-text search across all entities
`itp_filter`	Filter entities by type, status, confidence, epistemic tag
`itp_stats`	Aggregate statistics (counts by type, status, tag)
`itp_get`	Get a single entity by ID

Workshop tools (for tuning and evaluation)¶

These tools let you manage worker configurations, test workers, and track quality:

Tool	What it does
`workshop.worker.list`	List all worker configs with name and tier
`workshop.worker.get`	View a worker's full configuration
`workshop.worker.update`	Update a worker's system prompt or settings
`workshop.worker.test`	Test a worker against a sample payload
`workshop.eval.run`	Run an evaluation suite against a worker
`workshop.eval.compare`	Compare eval results against a quality baseline
`workshop.impact.analyze`	See which pipelines are affected by changing a worker
`workshop.deadletter.list`	View failed/unroutable tasks
`workshop.deadletter.replay`	Retry a failed task

Common workflows¶

Processing new source material (Tier 2)¶

This is the most common workflow. You have a new report, article, or Telegram message to integrate.

What to say to Claude:

Here is a new report from [source]. Process this through the standard pipeline.

[paste or attach source text]

What happens behind the scenes:

SP (Source Processor) extracts factual claims with epistemic tags (Fact, Inference, Uncertain, Speculation)
IA (Intelligence Analyst) analyzes claims against the framework, produces observations, variable assessments, and an integration spec
XV (Cross-Validator) checks that all entity references are valid and consistent
DE (Database Engineer) writes the validated changes to the YAML database

If IA flags the analysis as publication-ready, Baft automatically escalates to a Tier 3 audit.

What can go wrong and what to do:

Symptom	Cause	What to do
SP produces few or no claims	Source text too short or ambiguous	Ask Claude to show SP's raw output; provide more context
XV fails validation	Entity IDs don't match existing records	Review the entity refs in IA's output; correct and resubmit
Pipeline times out	Worker or LLM backend overloaded	Wait a minute and try again; check with your tech support

Quick database update (Tier 1)¶

For simple changes that don't need full analysis — status updates, formatting fixes, adding a note to an existing observation.

What to say to Claude:

Update the status of variable VAR-042 to "active"

or

Add this observation to OBS-100: "Recent reporting confirms continued activity"

What happens: XV validates the entity reference, then DE writes the change directly.

Publication audit (Tier 3)¶

Before publishing a brief or making a major thesis revision, run a blind audit. Three independent reviewers examine a neutralized version of your analysis.

What to say to Claude:

Run a publication audit on Brief BR-015 before we publish.

What happens:

TN (Terminology Neutralizer) strips ITP-specific terms so reviewers can't identify the framework
LA (Logic Auditor) checks logical reasoning and argument structure
PA (Perspective Auditor) evaluates for perspective bias and blind spots
RT (Red Teamer) challenges core claims and looks for alternative explanations
AS (Audit Synthesizer) merges all three reviews into an actionable report

All three reviewers run in parallel and are completely blind — they cannot see the ITP framework, only the neutralized text.

Reading the audit report:

The report includes:

Overall verdict: Pass, Pass with Revisions, or Escalate
Logic findings: Gaps in reasoning, unsupported claims, circular arguments
Perspective findings: Bias indicators, missing viewpoints, assumptions
Red team challenges: Each with a strength score (1-10); scores >= 8 trigger escalation
Integration patch: Suggested changes to the original analysis

Querying the database¶

You can search, filter, and get statistics about entities at any time.

Examples:

How many active observations do we have by epistemic tag?

Show me all gaps related to nuclear program

What is the current status of entity ENT-042?

Find all variables with confidence below 0.5

Improving worker quality¶

Over time you may want to tune how workers behave — adjusting their instructions, testing with different inputs, or comparing quality across model changes.

Testing a worker¶

Ask Claude to test a worker with a specific input:

Test the source processor with this sample text: [text]

Claude calls workshop.worker.test and returns the worker's output along with timing, token usage, and schema validation results.

Running an evaluation suite¶

An eval suite is a set of test cases with known expected outputs. Running one shows you how well a worker performs:

Run the eval suite for the source processor

Claude calls workshop.eval.run and returns scores for each test case. Scoring methods:

Field match — checks that specific output fields contain expected values
Exact match — checks for exact output equality
LLM judge — uses a separate LLM to evaluate quality on correctness, completeness, and format

Comparing against a baseline¶

After establishing a "golden" eval run as your quality baseline, you can compare new runs against it to detect regressions:

Compare this eval run against the baseline for the source processor

This shows per-case improvements and regressions, helping you catch quality degradation before it affects your work.

Checking change impact¶

Before changing a worker's configuration, check what else it affects:

What pipelines would be affected if I change the intelligence analyst?

Claude calls workshop.impact.analyze and shows you which pipelines use that worker, which downstream stages depend on it, and the risk level (high if downstream stages exist).

Monitoring and debugging¶

The TUI dashboard¶

If you have a terminal available, you can watch pipeline execution in real time:

uv run loom ui --nats-url nats://localhost:4222

This shows four panels:

Goals — active pipeline goals, their status, and how long they've been running
Tasks — individual worker tasks, which model tier is handling them, elapsed time
Pipeline — stage-by-stage execution with wall time per stage
Events — scrolling log of all system messages

Keyboard shortcuts: q to quit, c to clear the event log, r to refresh

The TUI is read-only — it observes what's happening but never changes anything. Safe to run alongside production work.

Failed tasks (dead-letter queue)¶

Sometimes tasks fail — a worker times out, an LLM produces invalid output, or a network glitch interrupts communication. These failed tasks land in the dead-letter queue.

Viewing failed tasks:

Show me the dead-letter queue

Claude calls workshop.deadletter.list and shows you each failed task with:

What worker it was intended for
Why it failed
When it failed

Retrying a failed task:

Replay dead-letter entry DL-042

Claude calls workshop.deadletter.replay, which re-submits the task to the router. Every replay is recorded in the audit trail for governance reviews.

Pipeline reliability¶

Baft automatically retries failed pipeline stages:

Local tier workers (SP, XV, TN, DE) retry up to 2 times — these use fast local models, so retries are cheap
Standard and frontier tier workers (IA, LA, PA, RT, AS) retry up to 1 time — these use expensive API calls, so retries are conservative
Only transient failures are retried (timeouts, temporary errors). If a worker produces output that fails schema validation, it won't be retried — that's a configuration issue, not a transient failure

The Workshop web UI¶

For more hands-on worker management, you can use the Workshop web interface:

uv run loom workshop --port 8080

Open http://localhost:8080 in your browser. The Workshop provides:

Worker list — all 13 workers with their tier, description, and status
Test bench — test any worker with custom inputs and see full outputs
Eval dashboard — run evaluation suites, compare against baselines, track quality over time
Pipeline editor — view and modify pipeline stage configurations
Dead-letter inspector — browse failed tasks with full details

Understanding the tier system¶

Every analytical task runs at a specific tier, which determines the LLM model used:

Tier	Model	Cost	Speed	Used for
Local	Ollama (llama3.2:3b)	Free	Fast (3-7s)	SP, DE, XV, IN, TN, SA — mechanical tasks
Standard	Claude Sonnet	Moderate	Medium (5-15s)	LA, PA, AS, WT, NI — analytical tasks
Frontier	Claude Opus	High	Slower (10-30s)	IA, RT — complex reasoning tasks

The system automatically selects the right tier for each worker. You don't need to think about this — it's handled by the worker configurations.

Understanding epistemic tags¶

Every claim extracted from source material gets an epistemic tag:

Tag	Meaning	Confidence band
Fact	Directly observable or verifiable	0.8 - 1.0
Inference	Logically derived from known facts	0.5 - 0.8
Uncertain	Plausible but unverified	0.3 - 0.5
Speculation	Hypothetical, requires significant assumptions	0.0 - 0.3

These tags flow through the entire pipeline — from SP's extraction through IA's analysis to DE's database writes. They help you and the audit system assess the reliability of analytical conclusions.

Daily routine¶

Before your session¶

Pull latest framework data (if others have been working):

cd ~/IranTransitionProject/framework && git pull

Update DuckDB (if framework changed):

cd ~/IranTransitionProject/baft
uv run python pipeline/scripts/itp_import_to_duckdb.py --incremental

Start workers (if not already running):

bash scripts/run_workers.sh

During your session¶

Work through Claude as described above. The standard pattern:

Process sources (Tier 2 pipeline)
Review IA's output for accuracy
Confirm or reject XV's validation
Commit approved changes to the framework

After your session¶

cd ~/IranTransitionProject/framework
git add -A
git commit -m "Session: [date] — [brief description]"
git push

The framework repository is the analytical source of truth. Every session's work should be committed and pushed.

Getting help¶

Problem	Who to ask	What to tell them
Claude doesn't see Baft tools	Tech support	"MCP tools not appearing" — they'll check the config
Worker produces wrong output	Review the system prompt	Use `workshop.worker.get` to see current config
Pipeline keeps timing out	Tech support	Which pipeline, what input, and the error message
Quality has degraded	Run an eval comparison	Use `workshop.eval.compare` against your baseline
Need to change how a worker behaves	Use `workshop.worker.update` or ask tech support	Describe what output you expect vs. what you're getting

For detailed technical troubleshooting, see the Operations Guide.

For connection setup, see the Claude Desktop Guide.