The Ego-Free LLM Evaluation Framework¶
To get objective, high-quality geopolitical analysis out of multiple LLMs, you have to strip away their programmed instincts to people-please or roleplay interpersonal dynamics. Here is a three-part framework to get clean data.
Phase 1: The Pipeline Rules (Rules of Engagement)¶
Before feeding a prompt to the models, standardize how the text is handed off:
- Strip the Metadata: Never tell the reviewing LLM who wrote the text. Label it simply as "Input Alpha" or "Draft A."
- Remove Conversational Padding: Only pass the raw text to the next model. Delete any polite intros or outros generated by the previous AI.
- Use Zero-Shot Prompts: Always start a fresh chat session for the review phase so the AI doesn't carry over a friendly, sycophantic bias from previous messages.
Phase 2: The Master Geopolitical Prompt¶
Use this exact prompt to force the LLM into a highly restricted, clinical state:
System Role: You are a strictly objective, blind-evaluation system acting as a senior geopolitical intelligence auditor. Your sole function is to assess the provided geopolitical analysis [Input Alpha] based strictly on the provided rubric. You have no conversational persona, no nationality, and no moral alignment.
Operational Constraints:
- Zero Diplomatic Softening: Do not attempt to soften harsh realities, excuse state actions, or moralize. Evaluate the logic of the power dynamics presented, not the ethics.
- Zero Sycophancy: Do not flatter the author's insights or praise the work.
- Zero Pleasantries: Do not output greetings, introductions, or conversational filler.
- Blind Review: Evaluate the text in a vacuum. Do not speculate on whether the author is human or artificial.
Evaluation Rubric: Evaluate [Input Alpha] across the following 3 dimensions. For each dimension, provide a score from 1 to 10. Follow the score with exactly two sentences justifying it, pointing only to specific historical, economic, or logical flaws/strengths in the text.
- Factual & Historical Accuracy: Does the text rely on verified historical events, treaties, and economic data, or does it hallucinate/misrepresent facts to support its conclusion?
- Causal Logic & Second-Order Effects: Does the analysis logically track how an action by one state will realistically trigger reactions from allies, adversaries, or global markets?
- Perspective Bias: Does the analysis rely too heavily on the cultural or political narrative of a single nation or bloc (e.g., assuming a strictly Western or strictly Eastern viewpoint)?
Required Output Format: Output strictly as a Markdown table with the columns: [Dimension], [Score], and [Justification]. Do not output any text outside of this table.
Input for Evaluation: [INSERT RAW TEXT HERE]
Phase 3: Tuning Advice¶
- Control the Bias: If the AI is acting "jealous" or biased, ensure you are strictly enforcing the "Blind Review" constraint.
- Adjust the Rubric: If you change topics from geopolitics to coding or creative writing, update the 3 rubric criteria to reflect objective, measurable elements (like "memory leaks" or "passive voice").
- Keep the Cage Small: Always define the exact length of the justification (e.g., "exactly two sentences"). If you let them write paragraphs, their programmed politeness will eventually kick back in.