# Co-Researcher — Full Reference

> Co-Researcher installs structured research protocols into Claude Code, Gemini CLI, OpenAI Codex, and OpenCode. Each skill follows the method a trained researcher uses. Governed by Systemic Honesty: no fabricated citations, no hallucinated data, explicit uncertainty quantification throughout.

## Core Principle: Systemic Honesty

Systemic Honesty is a protocol constraint, not a model instruction. Telling a model "don't fabricate" degrades under pressure. Co-Researcher builds verification into the method at every step: the search step, the citation step, the synthesis step.

Rules enforced in every skill:
- Never fabricate a citation. If a source cannot be verified, say so explicitly.
- Distinguish what the evidence shows from what it suggests.
- Quantify uncertainty: "likely," "insufficient evidence," "conflicting findings."
- Refuse to call a review "comprehensive" when coverage is partial.
- Flag methodological limits before presenting conclusions.
- Accuracy over output count: a shorter, honest answer beats a longer, invented one.

## Research Skills

### Critical Analysis (`/analyze`)
Evaluates the logical strength of arguments, not their surface coherence.
- Fallacy detection: ad hominem, strawman, false dichotomy, appeal to authority, circular reasoning, hasty generalization
- Bias identification: confirmation bias, selection bias, framing effects, motivated reasoning
- Contradictory evidence handling: surfaces counter-evidence before reaching conclusions
- Methodology critique: identifies confounds, sampling errors, measurement problems
- Output: structured critique with specific evidence, not general impressions

### Literature Review (`/review-lit`)
Systematic search with citation verification at every step.
- Search strategy: keyword construction, database selection, Boolean operators
- Citation chaining: backward and forward citation tracing
- Hallucination detection: flags any source it cannot verify against known databases
- Gap analysis: identifies what the field has not studied, not just what it has
- Refuses to cite sources it cannot verify; marks unverified sources explicitly
- Output: structured review with search strategy, inclusion criteria, and gap map

### Hypothesis Testing (`/hypothesis`)
Maps research questions to testable, falsifiable hypotheses.
- Variable mapping: independent, dependent, mediating, moderating variables
- Falsification criteria: explicit conditions under which the hypothesis would be rejected
- Experimental controls: identifies confounds and required control conditions
- Distinguishes testable empirical claims from unfalsifiable conceptual ones
- Output: hypothesis statement, variable map, falsification criteria, proposed controls

### Quantitative Analysis (`/quantitative`)
Selects and interprets statistical methods appropriate to the data and question.
- Statistical method selection: parametric vs. non-parametric, test assumptions
- Effect size interpretation: Cohen's d, r, eta-squared — not just p-values
- Power analysis: sample size requirements for specified effect and alpha
- Simpson's paradox detection: checks for reversal across subgroups
- Common errors flagged: p-hacking, multiple comparisons, HARKing
- Output: method recommendation with assumptions, effect size interpretation, power analysis

### Qualitative Research (`/qualitative`)
Structures qualitative inquiry from design through analysis.
- Study design: phenomenology, grounded theory, ethnography, case study selection
- Interview protocol: leading-question detection and correction
- Coding strategy: open, axial, selective coding; codebook development
- Thematic analysis: theme identification, saturation assessment, negative case analysis
- Reflexivity: prompts researcher positionality documentation
- Output: codebook, theme map, saturation assessment, reflexivity memo

### Peer Review (`/review`)
Structures feedback the way a journal reviewer would.
- Evaluates: research question clarity, methodological appropriateness, result interpretation, contribution claim
- Rubric-based scoring across: significance, rigor, clarity, contribution
- Flags: overclaiming, underpowered studies, missing limitations sections
- Separates major revisions (methodological) from minor revisions (presentation)
- Output: structured review letter with major/minor concerns, specific line references

### Ethics Review (`/ethics`)
IRB-standard risk assessment for research involving human subjects or sensitive data.
- Participant risk assessment: physical, psychological, privacy, relational harms
- Informed consent review: adequacy of disclosure, voluntariness, comprehension
- Data privacy: anonymization adequacy, re-identification risk, retention policy
- Dual-use concerns: potential for harm if research methods or findings are misapplied
- Vulnerable populations: additional protections for minors, prisoners, cognitively impaired
- Output: risk matrix, consent adequacy assessment, recommended safeguards

### Systematic Review (`/systematic`)
PRISMA-standard protocol for evidence synthesis.
- Protocol development: PICO/PICOS framework, pre-registration checklist
- Search strategy: multi-database, grey literature, hand-searching
- Screening: title/abstract, full-text, with documented inclusion/exclusion criteria
- Risk of Bias: RoB 2.0 for RCTs, ROBINS-I for observational studies, Newcastle-Ottawa
- Data extraction: standardized forms, inter-rater reliability
- Output: PRISMA flow diagram, evidence table, narrative synthesis, bias assessment

### Research Synthesis (`/synthesize`)
Narrative synthesis with explicit uncertainty quantification.
- Evidence grading: GRADE framework (high/moderate/low/very low certainty)
- Consistency assessment: identifies convergent and divergent findings across sources
- Mechanism reasoning: distinguishes correlation from proposed causal pathways
- Uncertainty language: calibrated to evidence strength throughout
- Refuses to present weak consensus as established fact
- Output: synthesis narrative, evidence grade table, uncertainty flags, open questions

### Research Methodology (`/methodology`)
Matches research questions to appropriate study designs.
- Design selection: experimental, quasi-experimental, observational, mixed methods
- Validity controls: internal validity threats, external validity limits
- Sampling strategy: probability vs. purposive, sample size rationale
- Measurement: reliability, validity, operationalization quality
- Flags mismatches between research question and proposed design
- Output: design recommendation with rationale, validity analysis, sampling plan

### Grant Writing (`/grant`)
Structures research proposals for competitive funding.
- Specific Aims: one-page structure, hook, gap, hypothesis, aims
- Significance and Innovation: NIH-style framing, SBIR/STTR variants
- Approach: experimental design, timeline, risk mitigation
- Agency alignment: matches language and priorities to target funder
- Common weaknesses flagged: overclaiming novelty, underpowered designs, vague timelines
- Output: draft Specific Aims, significance narrative, approach outline

### Lateral Thinking (`/lateral`)
Generates novel research framings through cross-domain reasoning.
- First-principles decomposition: strips the problem to foundational assumptions
- Cross-domain analogy: maps the research problem onto solved problems in other fields
- Constraint satisfaction: identifies which constraints are fixed vs. assumed
- Assumption inversion: asks what happens if the opposite is true
- Output: reframed problem statement, analogies, assumption audit, novel hypotheses

### Research Orchestration (`/research`)
Multi-agent coordination for complex research questions.
- Question analysis: identifies required expertise and sequencing
- Agent selection: matches sub-tasks to appropriate skills
- Execution planning: phases with dependencies, parallel and sequential tasks
- Modes: interactive (review plan before execution), auto (direct execution), plan-only
- Templates: quick (literature scan), rigorous (full systematic review), comprehensive (multi-method)
- Output: execution plan, coordinated agent outputs, integrated findings

## Benchmark Results

Live evaluation at: https://coresearcher.poemswe.com/evals/
- 22 test cases across 8 agent types
- Models evaluated: Claude (Anthropic), Codex (OpenAI), Gemini (Google)
- Rubrics: reasoning quality, output structure, analytical depth, quantitative rigor, qualitative methods, research design
- Scoring: Excellent / Good / Fair / Poor with rubric-by-rubric breakdowns

## Installation

Claude Code: `claude plugins install poemswe/co-researcher`
Gemini CLI: `gemini extension install https://github.com/poemswe/co-researcher`
OpenAI Codex: tell Codex to fetch https://raw.githubusercontent.com/poemswe/co-researcher/main/.codex/INSTALL.md
Source: https://github.com/poemswe/co-researcher

## Links

- Landing page: https://coresearcher.poemswe.com/
- Benchmark arena: https://coresearcher.poemswe.com/evals/
- GitHub: https://github.com/poemswe/co-researcher
- Creator: https://poemswe.com
- Summary (llms.txt): https://coresearcher.poemswe.com/llms.txt