An Experiment in Collaborative Science Writing
A 4-hour experiment producing a graduate-level methodology guide on statistical modelling in environmental science, co-authored with ChatGPT, Claude, and Gemini.
The Premise
I'm a physical oceanographer. I know marine heatwaves, Antarctic oceanography, climate variability. What I don't have is deep expertise in statistical modelling methodology. Yet I kept noticing the same conceptual errors in the literature — models described as explaining mechanisms when they could only demonstrate associations, predictions made without any out-of-sample validation.
Good papers on this exist — Shmueli (2010), Tredennick et al. (2021) — but they're scattered, often highly mathematical, and rarely accessible to a PhD student without a statistics background. The gap was pedagogical, not empirical.
So I tried something: use AI not as a writing assistant, but as a thinking partner — to do the deep literature review, generate structure, draft content, and receive peer review from other AI systems. My role was to be the critical scientific editor, not the writer.
Whether this paper would pass peer review, I genuinely don't know — I'd be too embarrassed to find out. But as a demonstration of what AI-assisted scholarship could look like, it raises real questions about authorship, expertise, and the future of academic writing.
The Process
Each step had a distinct purpose. The human's job was to direct, interrogate, and judge — not to write.
Research Phase
Mini literature reviews on several aspects of statistical modelling in environmental science — inference vs prediction, validation, confounding, variable selection, the construct–measurement gap.
ChatGPT (research synthesis)Architecture Phase
ChatGPT generated an outline covering description, association, prediction, and mechanistic understanding. I reviewed and modified the structure — reordering sections, requesting concrete numerical examples and figures, specifying the target audience.
ChatGPT + Human reviewDrafting Phase
Full first draft produced from the agreed structure. Human review involved asking for clarification on technical points, requesting specific worked examples (the coral bleaching and fish growth illustrations), and modifying the structure further.
ChatGPT (primary author)Multi-Model Peer Review
The first draft was sent to Claude (Anthropic) for independent peer review. Critique focused on logical consistency, missing nuances, and whether the non-mathematical framing was pedagogically sound.
Claude (Anthropic) — ReviewerMulti-Model Peer Review
Gemini (Google) conducted a second independent review, adding further critique of framing, examples, and completeness. Having two independent reviewers from different model families mirrors genuine peer review practice.
Gemini (Google) — ReviewerRevision Phase
Reviewer comments were synthesised and fed back to ChatGPT, which produced a revised draft addressing the critiques. The human judged which reviewer comments to accept, reject, or modify.
ChatGPT + Human editorial judgmentFinalisation
Final editing, polishing, and quality judgement by the human author. The result: a 13-page methodology paper with worked examples, figures, a decision checklist, and a glossary — targeted at PhD students with limited statistical background.
The Output
Preprint · Unpublished
A guide to conceptual clarity in environmental statistical modelling. The paper argues that most problems in environmental modelling are not technical — they are conceptual, arising at the interpretation stage rather than the analysis stage. It provides a systematic framework for distinguishing description, association, prediction, and mechanistic explanation, with worked examples from coral bleaching and fish growth ecology.
Core Argument
A regression model cannot, by itself, establish causation. Statistical explanation — accounting for variance — is categorically different from mechanistic understanding. Language should scale with evidence.
Target Audience
PhD students and early-career researchers in environmental science with limited mathematical background. Prioritises conceptual clarity over technical depth.
Key Practical Tool
"Without validation: X is associated with Y. With validation: X can predict Y. With mechanistic evidence: X influences Y via Z." — the language–evidence ladder.
Novel Contribution
Synthesises dispersed, highly mathematical literature (Shmueli 2010, Tredennick 2021, Breiman 2001) into an accessible, example-driven guide with decision checklists.