How AI works · part 3 of 4
Where they help
Five jobs these tools do well in professional work, where each fails, and the discipline each requires.
Most of what is written about these tools is either promotion or dismissal, and neither helps you decide what to delegate to them.
This page sets out five jobs the tools do well in professional work, how each goes wrong, and the discipline each requires. The judgments are ours, from use.
One principle sits under all five. These tools are strongest when you bring the material with you: give them the document, the statute, the email chain, the draft, and ask them to work on that. They are weakest when you ask them to recall facts you did not give them.
The jobs below run roughly from the first kind of work to the second, and the reliability falls as they go.
1 · Summarising and digesting
Give the tool a long document — a judgment, a contract, a transcript, a bundle of correspondence — and ask what it says. The material is in front of it, and condensing text is the thing the engine does best. A serviceable summary of a hundred pages arrives in seconds, and you can interrogate it: what does it say about termination? Where does the witness deal with the meeting?
It fails by omission and emphasis. A summary is a judgment about what matters, and the model's sense of what matters is generic, not yours. The clause that decides your issue can be compressed into a list it thought routine, and the middle of a very long document is where detail goes missing.
The discipline: treat the summary as a map, not a substitute. Use it to find the parts that matter, then read those parts yourself. Never quote a document on the strength of its summary alone.
2 · First drafts and redrafting
A first draft of a letter, the structure of a memo, a clumsy paragraph rewritten plainly, the same advice reframed for a lay client: drafting and redrafting from material and instructions you supply is the second-strongest job, and for many professionals the most useful day to day. It deals with the blank page, which is where a great deal of professional time goes.
Two failure modes. The prose defaults to a generic, faintly inflated register, and a document that matters will not sound like you until you make it. More seriously, a fluent draft invites fluent invention: a confident recital of a fact you never gave it, a plausible date, an assumed term. The errors hide in the fluency.
The discipline: the draft is raw material, and signing it makes it yours. Read it as you would a junior's work, remembering that this junior writes well, never flags its own doubts, and made up anything it did not know. It requires a great deal of care and caution.
3 · Transforming and structuring
Notes into prose. A rambling email chain into a chronology. A table into narrative, a narrative into a table, one format into another. Unglamorous work, and the tools are good at it: the shape of the material changes while the substance stays yours.
For many professionals this is where the most value quietly accrues.
The characteristic failure is silent loss. Restructuring sixty items, the model may deliver fifty-seven, neatly, and nothing tells you three are gone. Dates transpose. Near-duplicate entries merge. The output looks complete because it looks tidy, and tidy is what you asked for.
The discipline: count in, count out. Where completeness matters — a chronology, a schedule, a list of parties — check the tally against the source, and spot-check the entries that would hurt if they were wrong.
4 · Research and retrieval
Asking the tool what the law is, what the cases say, what the literature shows. This is the job people most want and the one the raw LLM engine does the worst.
It sits at the "facts you did not give it" end of the spectrum, where the confident-but-wrong failure lives: the invented case with the plausible citation is not a rare blooper but the predictable result of asking a prediction engine to recall.
Used for what it is, it can still help: a way into an unfamiliar topic, the vocabulary to search properly, angles you had not considered. Useful, none of it citable. Tools grounded in real databases — legal research products, or a chatbot with search switched on — do better, because they retrieve before they write. But retrieval narrows the failure rather than removing it, and the model can still misstate what it found.
The discipline is absolute: verify every authority against the source before you rely on it, every time, no exceptions.
Nothing the model says about a case is evidence of what the case says.
5 · Review and critique
Show the tool your draft and ask it to argue back: what is wrong with this, where are the gaps, what would the other side say, what have I assumed? In our experience this is the most underrated job on the list, and one of the safer ones, because it produces questions for you rather than text you might rely on.
The failure mode is flattery. As part two explains, these models lean agreeable, and asked for an opinion they will find merit. The review you need is the one you have to insist on.
The discipline: ask for the attack, not the compliment. "Find the three weakest points and press them" produces something useful; "what do you think of this?" produces warmth. And the critique is a prompt for your judgment, not a verdict to adopt. It can be confidently wrong about a weakness, too.
The other half of the question
Everything above is about whether the tool does the job well. Whether you can give it the material at all — client information, personal information, privileged advice — is a separate question, and it is the one our companion module answers: what is safe to put into which tool, on which plan. The two questions together are what using these tools well means.