This is a Tautoru view, not a settled position.
Every major provider reserves the right to have a person read your conversation. The reassurance usually offered, that no one reads your chats, describes a default, not a rule. The default gives way in four situations: when an automated safety classifier (software that scans conversations for signs of harm) flags a conversation as potentially harmful or in breach of the provider's usage policies; when you submit feedback or ask for support; when the provider investigates an account; and when the law compels production. The first matters most, because it is not in your control and you will generally not know it has happened. The classifiers are tuned to the gravest subject matter: violence, weapons, child exploitation, fraud. A flag moves the conversation into a queue where a person may read it, the ordinary deletion clock stops, and in the most serious cases the provider may go to the authorities.
None of this is hidden; it is stated in documents most users never open. Anthropic says that by default its employees cannot access conversations, the exceptions being feedback you choose to submit and review needed to enforce its usage policy, where access is limited to designated trust and safety staff and logged. OpenAI routes conversations its systems read as planning harm to others to "specialized pipelines where they are reviewed by a small team", and its enterprise privacy page discloses that authorised employees and "specialized third-party contractors" may access conversations "solely to review for abuse and misuse". Google is the most direct: human reviewers, including contractors, read a sample of consumer Gemini conversations, and the privacy notice asks users not to enter "confidential information that you wouldn't want a reviewer to see". Microsoft splits by product. Microsoft 365 Copilot, the work version, has opted out of human abuse review altogether; on the consumer Copilot, "an opt-out of human review is not available". All of this is as the documents stand at 12 June 2026; they move quickly.
Review can end with the authorities. Anthropic and OpenAI both report child sexual abuse material to the relevant United States authorities as a matter of policy. OpenAI announced in August 2025 that where its reviewers determine a conversation "involves an imminent threat of serious physical harm to others, we may refer it to law enforcement"; it is currently not referring self-harm cases to police. Anthropic's privacy policy permits disclosure to authorities as required by law, and the version taking effect on 8 July 2026 adds disclosure on a good-faith belief that it is reasonably necessary to prevent serious harm. These judgments are made by the provider's staff, applying its policies and United States law, on the text of a conversation, with no context about who wrote it or why. And there is no notice. You will not ordinarily learn that a conversation was read, retained or referred.
For lawyers, this problem is acute. The subject matter these classifiers are built to catch is the subject matter of practice: criminal defence instructions, fraud pleadings, regulatory investigations, family-violence affidavits. A classifier cannot tell a brief of evidence from a plan, or an account of an alleged offence from the offence itself. If client material describing criminal conduct is flagged, a person you cannot identify, in another jurisdiction, may read confidential and possibly privileged material; the material falls out of the ordinary deletion pipeline; and at the serious end it can be referred to overseas law enforcement, without your knowledge or the client's. Set that against the conduct rules. Chapter 8 entrusts the crime-disclosure judgment to the lawyer, within narrow limits: rule 8.2 compels disclosure only of an anticipated crime punishable by three years' imprisonment or more, or where necessary to prevent a serious risk to health or safety, and rule 8.4 permits it in defined cases. A provider's escalation pipeline makes a parallel judgment on the same material, by different people, on another country's standards, with no role for the lawyer at all. On privilege, the analysis on our privilege page holds: review you did not invite is not the voluntary disclosure in circumstances inconsistent with confidentiality that section 65 of the Evidence Act 2006 requires for waiver. But intact privilege is small comfort once the material has in fact been read. Confidentiality in fact is what chapter 8 obliges you to protect, and a flagged conversation is the clearest way to lose it.
The feedback buttons look harmless and are not. Pressing thumbs up or thumbs down is not an anonymous rating; it is consent to hand over the conversation. Anthropic's current privacy policy says that if you rate an output it "will store the entire related conversation as part of your Feedback"; feedback is kept for five years and may be used to train its models, and because feedback sits outside the no-training promise in its commercial terms, business plans are not protected from it either. OpenAI says the same about its training opt-out: "Even if you have opted out of training ... the entire conversation associated with that feedback may be used to train our models." Google keeps reviewed feedback, the associated conversations and related data for up to three years, and a feedback report can carry the previous 24 hours of your chats even when the activity setting is off. One click converts a conversation you were entitled to think protected into reviewable, retainable, trainable data. The remedies are simple. Do not press the buttons in any conversation that matters. On Claude's Team and Enterprise plans an administrator can disable rating across the organisation; a firm handling client material should.
Anthropic's update this week shows what flagging does to retention. Its retention policy for Mythos-class models, in force from 9 June 2026 and covering Claude Fable 5, requires prompts and outputs to be kept for 30 days for trust and safety purposes on every platform that offers those models, overriding zero-data-retention agreements (under which the provider would otherwise keep nothing) with no opt-out. Our Claude profile carries a bulletin on the change. The carve-out matters more than the headline period: after 30 days the data is deleted automatically "except in the rare cases where it's part of a safety investigation or we're legally required to keep it", with no outer limit stated, though every instance of human access is recorded in a tamper-proof log. Anthropic's standing retention policy puts numbers on the flagged path: a conversation flagged as violating its usage policy is kept for up to two years, and trust and safety classification scores for up to seven. The pattern is general: deletion promises are written subject to safety carve-outs, and a flag is the event that suspends them.
A flag can also feed training. Anthropic's consumer privacy policy states that content flagged for harmful material is disassociated from your user ID and used "to train our trust and safety internal classification and generative models", whatever your model-training setting says. Google's reviewers annotate the sampled conversations precisely so the models improve, and reviewed chats are not deleted when you delete your activity. As our privacy page explains, a provider that processes material only for you is your processor: under section 11 of the Privacy Act 2020 the information is treated as held by you, and no disclosure arises. Section 11(3) draws the line: the provider is treated as holding the information itself "if [it] uses or discloses the information for its own purposes". Safety review for the provider's own programme, retention for its own investigations, and training on flagged or feedback material are uses for its own purposes. Where personal information is involved, that engages information privacy principle 12 (IPP 12), the Privacy Act's control on sending personal information overseas, which permits disclosure to a foreign person or entity only on defined grounds — in practice, reasonable grounds to believe the recipient is required to protect the information with safeguards comparable, overall, to the Privacy Act. The processor footing on our privacy page holds for the ordinary run of material, which no classifier will ever notice. It is weakest for exactly the material that gets flagged. For anything a classifier might plausibly flag, assume the safety carve-outs may operate, and run the IPP 12 analysis on that footing.
Our view
Human review is the sharpest of the safety carve-outs, and the attention it is getting is deserved. For most users on most days it never happens, and the no-training, access-controlled plans we point to elsewhere remain the right footing for confidential work. But the carve-outs are real, they survive every tier including zero-data-retention arrangements, and they are aimed at the subject matter that fills a litigation or criminal practice. Treat the feedback buttons as a disclosure switch, and turn them off across your organisation where the plan allows. Assume that anything describing violence, children at risk or serious offending may be read by a person if you put it in, on any plan; keep that material out unless you and your client can live with the provider's review path. How that applies to your matter is your judgment, on your facts.