Most of the legal profession's anxiety about generative artificial intelligence centers on what appears on the page: invented citations, fabricated quotations, statutes that do not exist. That concern is warranted, and disciplined attorneys now scrub document text accordingly. But a quieter risk lives below the visible content, in the hidden fields that describe a file. When an AI tool drafts a document, it can populate or alter that metadata with author names, dates, version histories, and even hash values that are entirely fabricated. These hallucinations look authoritative, are rarely reviewed, and can quietly distort the very record that authentication and discovery disputes turn on.
Metadata is evidence, not a technical footnote
Metadata is often described as "data about data": the embedded layer of information that every digital file carries about itself. For the documents attorneys and their clients handle most, Microsoft Office files, PDFs, and images, that layer typically records the author and organization, creation and last-modified and printing dates, version history and tracked changes, and embedded comments or hidden notes. Email metadata is richer still, with headers describing sender, recipient, routing, and authentication.
What gives metadata its evidentiary weight is its provenance. Historically it was generated automatically by the operating system and application as a user created, saved, sent, or altered a file, in the background, without the user's involvement. That is precisely why courts and counsel treat it as a kind of digital audit trail: a record nobody sat down and typed. Parties frequently must produce electronically stored information in native format under the Federal Rules of Civil Procedure, because converting files to a secondary production format can strip the metadata that establishes when and by whom a document was made. Under Fed. R. Evid. 901 and Fed. R. Evid. 902, that same metadata is routinely used to authenticate documents and support admissibility. For how those fields are read in practice, see our discussion of metadata forensics.
Why AI fabricates metadata
The mechanism is the same one that produces fake citations in body text. Large language models do not retrieve or "know" facts; they predict the most probable next element in a sequence based on training patterns. As a 2025 research paper from OpenAI describes, the procedures used to train and evaluate these systems tend to reward confident guessing over admitting uncertainty. The model behaves like an over-eager junior associate who would rather supply a plausible answer than leave a field blank.
Applied to document generation, that tendency reaches the metadata as well as the prose. A "complete" file of a given type is expected to carry an author, a creation date, a version history. Faced with no genuine value for those fields, the model fills them with placeholders or approximations drawn from its training data. The output looks like ordinary, system-generated metadata. It is not. The risk compounds when AI is embedded directly in tools such as Microsoft Word or Google Docs, which already write metadata by default, producing a hybrid file in which the application attributes authorship to the actual user while the AI layer inserts a conflicting, invented "Author" into the same document.
Where the hallucinations hide
Fabricated metadata is not confined to obscure technical fields. It appears at two distinct depths, and both matter in litigation.
At the surface level, hallucinated values surface in the document "properties" any user can open with a few clicks, the Author, Company, and Created or Last Modified fields visible through the interface of Word, Excel, PowerPoint, or Acrobat. Because these fields were historically machine-generated, attorneys, experts, and even courts tend to presume they are valid. In one illustrative test, a popular AI tool asked only to "create a PowerPoint" produced a file whose Created and Last Modified dates were set more than a decade in the past and whose Last Modified By field named a person wholly unconnected to the document. That fabrication was not random: the name belonged to the real developer of an open-source library the AI used to build the file, and the dates tracked that library's early code-commit history. The model had stitched together genuine but contextually false information, which is exactly what makes this category of error so deceptive.
At the forensic level, the same fabrications reach the deeper metadata visible only through tools such as X-Ways, EnCase, or Magnet Axiom, including version histories, internal comments, and even hash values. In the same test, the forensic output reported MD5 and SHA-1 hashes and a comment field disclosing that the file was generated programmatically, values written by the AI process rather than the user's system. This is the more dangerous layer. Forensic tools report what a file contains, not whether those contents are truthful, so a hallucinated hash or timestamp can carry the full weight of forensic authority and be treated as ground truth unless someone cross-checks it.
The dual risk is the point: counsel can be misled by hallucinated values in a file's visible properties, and a court or expert can be misled by the same values surfaced through forensic analysis. A forensic tool faithfully reporting fabricated metadata does not make the metadata true.
How a forensic examiner detects fabricated metadata
Defensible authentication of an AI-era document does not rest on any single field. It rests on internal consistency and corroboration, the same provenance discipline that governs authentication generally under Rule 901 and Rule 902. A qualified examiner asks whether a file's self-reported history can be reconciled with independent evidence.
- Reconcile internal dates against each other and against external anchors, file-system timestamps, server and email logs, custodial records, so a "created" date cannot predate the technology or the matter.
- Recompute hash values independently rather than trusting a hash stored inside the file, and treat any embedded generation comments or tool signatures as leads, not conclusions.
- Compare the questioned document's metadata profile against known-authentic exemplars from the same custodian, system, or software to expose fields that do not fit the expected pattern.
- Trace the file to its native source and full chain of custody rather than a downstream production copy that may have lost or rewritten provenance.
- Separate benign, explainable artifacts, format conversion, normal editing, from indicia of fabrication, and state plainly the limits of what the available evidence can and cannot establish.
This is also where independent review earns its keep. When metadata is the linchpin of an opposing party's authenticity argument, a structured review of the opposing forensic report can test whether the metadata was independently verified or simply read off the file and reported as fact.
What this means for counsel
The duty to verify AI output does not stop at the visible text. An attorney who carefully scrubs every citation in a brief can still file a document whose embedded properties name the wrong author, the wrong firm, or an impossible date, fields that may later be read against the client in discovery, authentication, or privilege disputes. The prudent course is to treat metadata as a first-class evidentiary issue: examine native-file properties before production, preserve true provenance, and retain forensic expertise early when authenticity is genuinely contested. As with any reliability question, the right moment to surface it is at case intake, not on the eve of a hearing. AUTHENTICATION
Authorities & further reading
- Fed. R. Evid. 901
- Fed. R. Evid. 902
- Gauthier v. Goodyear Tire & Rubber Co., No. 1:23-cv-281, 2024 BL 431433 (E.D. Tex. 2024)
- Coomer v. Lindell, No. 1:22-cv-01129-NYW-SBP (D. Colo. July 7, 2025)
Adapted from Law & Forensics continuing-legal-education and seminar materials (2025–2026). This article is general information for attorneys and is not legal advice; it does not create an attorney-client, expert, or consulting relationship.