An Experiment in Collaborative Science Writing

When AI Writes the Paper — and the Human Writes the Questions

A 4-hour experiment producing a graduate-level methodology guide on statistical modelling in environmental science, co-authored with ChatGPT, Claude, and Gemini.

Alex Sen Gupta UNSW Centre for Marine Science and Innovation 4 hours Total production time 3 AI models ChatGPT · Claude · Gemini

Scroll

The Premise

What happens when a domain expert outside their expertise uses AI to write a publishable paper?

I'm a physical oceanographer. I know marine heatwaves, Antarctic oceanography, climate variability. What I don't have is deep expertise in statistical modelling methodology. Yet I kept noticing the same conceptual errors in the literature — models described as explaining mechanisms when they could only demonstrate associations, predictions made without any out-of-sample validation.

Good papers on this exist — Shmueli (2010), Tredennick et al. (2021) — but they're scattered, often highly mathematical, and rarely accessible to a PhD student without a statistics background. The gap was pedagogical, not empirical.

So I tried something: use AI not as a writing assistant, but as a thinking partner — to do the deep literature review, generate structure, draft content, and receive peer review from other AI systems. My role was to be the critical scientific editor, not the writer.

"The collaboration, where I was a minor contributor, resulted in what I think would be a really useful addition to the literature. The whole process took 4 hours."

Whether this paper would pass peer review, I genuinely don't know — I'd be too embarrassed to find out. But as a demonstration of what AI-assisted scholarship could look like, it raises real questions about authorship, expertise, and the future of academic writing.

The Process

A seven-step collaborative workflow

Each step had a distinct purpose. The human's job was to direct, interrogate, and judge — not to write.

Research Phase

Deep Literature Review

Mini literature reviews on several aspects of statistical modelling in environmental science — inference vs prediction, validation, confounding, variable selection, the construct–measurement gap.

ChatGPT (research synthesis)

Architecture Phase

Structure Generation & Review

ChatGPT generated an outline covering description, association, prediction, and mechanistic understanding. I reviewed and modified the structure — reordering sections, requesting concrete numerical examples and figures, specifying the target audience.

ChatGPT + Human review

Drafting Phase

First Complete Draft

Full first draft produced from the agreed structure. Human review involved asking for clarification on technical points, requesting specific worked examples (the coral bleaching and fish growth illustrations), and modifying the structure further.

ChatGPT (primary author)

Multi-Model Peer Review

External AI Review — Claude

The first draft was sent to Claude (Anthropic) for independent peer review. Critique focused on logical consistency, missing nuances, and whether the non-mathematical framing was pedagogically sound.

Claude (Anthropic) — Reviewer

Multi-Model Peer Review

External AI Review — Gemini

Gemini (Google) conducted a second independent review, adding further critique of framing, examples, and completeness. Having two independent reviewers from different model families mirrors genuine peer review practice.

Gemini (Google) — Reviewer

Revision Phase

Synthesis & Second Draft

Reviewer comments were synthesised and fed back to ChatGPT, which produced a revised draft addressing the critiques. The human judged which reviewer comments to accept, reject, or modify.

ChatGPT + Human editorial judgment

Finalisation

Human Editorial Pass → Final Manuscript

Final editing, polishing, and quality judgement by the human author. The result: a 13-page methodology paper with worked examples, figures, a decision checklist, and a glossary — targeted at PhD students with limited statistical background.

⏱ Total elapsed time: ~4 hours | 13 pages | 3 AI collaborators

The Output

The manuscript

Preprint · Unpublished

When Models Explain, When They Predict, and When They Do Neither

Claude.ai · ChatGPT · Gemini · Alex Sen Gupta (UNSW)

A guide to conceptual clarity in environmental statistical modelling. The paper argues that most problems in environmental modelling are not technical — they are conceptual, arising at the interpretation stage rather than the analysis stage. It provides a systematic framework for distinguishing description, association, prediction, and mechanistic explanation, with worked examples from coral bleaching and fish growth ecology.

Core Argument

A regression model cannot, by itself, establish causation. Statistical explanation — accounting for variance — is categorically different from mechanistic understanding. Language should scale with evidence.

Target Audience

PhD students and early-career researchers in environmental science with limited mathematical background. Prioritises conceptual clarity over technical depth.

Key Practical Tool

"Without validation: X is associated with Y. With validation: X can predict Y. With mechanistic evidence: X influences Y via Z." — the language–evidence ladder.

Novel Contribution

Synthesises dispersed, highly mathematical literature (Shmueli 2010, Tredennick 2021, Breiman 2001) into an accessible, example-driven guide with decision checklists.