Co-Researcher Benchmark Arena
A live evaluation dashboard for Co-Researcher — a PhD-level research agent suite. JavaScript is required to view the interactive dashboard.
What is Co-Researcher?
Co-Researcher is a professional research suite for Claude Code, Gemini CLI, OpenAI Codex, and OpenCode. It provides PhD-level research skills governed by Systemic Honesty principles — no fabricated citations, no hallucinated data.
Research Skills
- Critical Analysis — logic checking, fallacy detection, bias identification
- Literature Review — systematic search, citation chaining, gap analysis
- Hypothesis Testing — variable mapping, falsification criteria, experimental design
- Quantitative Analysis — statistical power, effect size interpretation, method selection
- Qualitative Research — thematic analysis, coding strategy, saturation assessment
- Peer Review — manuscript critique and methodological evaluation
- Ethics Review — IRB compliance and privacy risk assessment
- Research Synthesis — narrative synthesis with uncertainty quantification
- Systematic Review — PRISMA-standard protocol and Risk of Bias assessment
- Grant Writing — funding strategy and proposal development
- Lateral Thinking — cross-domain analogies and first-principles reasoning
What This Arena Benchmarks
22 test cases across 8 agent types. Models evaluated include Claude (Anthropic) and Codex (OpenAI). Rubrics cover reasoning quality, output structure, analytical depth, quantitative rigor, qualitative methods, and research design. Results are scored Excellent / Good / Fair / Poor.
Install
- Claude Code:
claude plugins install poemswe/co-researcher
- Gemini CLI:
gemini extension install https://github.com/poemswe/co-researcher
- Source: github.com/poemswe/co-researcher
Part of poemswe.com — AI tools for research and writing.