.noscript-fallback { font-family: Georgia, serif; max-width: 720px; margin: 4rem auto; padding: 0 1.5rem; line-height: 1.7; color: #18181b; } .noscript-fallback h1 { font-size: 2rem; margin-bottom: 0.5rem; } .noscript-fallback h2 { font-size: 1.2rem; margin: 2rem 0 0.5rem; font-weight: 600; } .noscript-fallback ul { padding-left: 1.5rem; } .noscript-fallback li { margin-bottom: 0.4rem; } .noscript-fallback a { color: #1e3a8a; } .noscript-fallback .meta { color: #71717a; font-size: 0.9rem; margin-bottom: 2rem; }

Co-Researcher Benchmark Arena

A live evaluation dashboard for Co-Researcher — a PhD-level research agent suite. JavaScript is required to view the interactive dashboard.

What is Co-Researcher?

Co-Researcher is a professional research suite for Claude Code, Gemini CLI, OpenAI Codex, and OpenCode. It provides PhD-level research skills governed by Systemic Honesty principles — no fabricated citations, no hallucinated data.

Research Skills

Critical Analysis — logic checking, fallacy detection, bias identification
Literature Review — systematic search, citation chaining, gap analysis
Hypothesis Testing — variable mapping, falsification criteria, experimental design
Quantitative Analysis — statistical power, effect size interpretation, method selection
Qualitative Research — thematic analysis, coding strategy, saturation assessment
Peer Review — manuscript critique and methodological evaluation
Ethics Review — IRB compliance and privacy risk assessment
Research Synthesis — narrative synthesis with uncertainty quantification
Systematic Review — PRISMA-standard protocol and Risk of Bias assessment
Grant Writing — funding strategy and proposal development
Lateral Thinking — cross-domain analogies and first-principles reasoning

What This Arena Benchmarks

22 test cases across 8 agent types. Models evaluated include Claude (Anthropic) and Codex (OpenAI). Rubrics cover reasoning quality, output structure, analytical depth, quantitative rigor, qualitative methods, and research design. Results are scored Excellent / Good / Fair / Poor.

Install

Claude Code: claude plugins install poemswe/co-researcher
Gemini CLI: gemini extension install https://github.com/poemswe/co-researcher
Source: github.com/poemswe/co-researcher

Part of poemswe.com — AI tools for research and writing.

Capability Matrix

Agent

Claude

Codex

Gemini

Peak

Agent Capability Signature

Historical Evolution

Test Case Breakdown

Detailed results for all evaluation tests