A groundbreaking study from researchers at ETH Zurich and Anthropic has demonstrated that large language models can autonomously re-identify anonymous online users with unprecedented accuracy, fundamentally challenging the assumption that pseudonymous internet participation offers meaningful privacy protection.

What Happened

Published in February 2026, the paper "Large-scale online deanonymization with LLMs" shows that LLM-powered agents can successfully link pseudonymous accounts to real-world identities across platforms like Hacker News, Reddit, and LinkedIn. The research reveals success rates of up to 68% recall at 90% precision—a dramatic leap from classical stylometry approaches, which achieved near 0% success rates on the same datasets.

Perhaps most striking: the LLM agent accomplished in minutes what would have taken human investigators hours or days of painstaking manual work. The cost? Just a few dollars in API calls, according to reports analyzing the research.

Technical Details

The researchers developed a four-stage pipeline that transforms raw text into identified individuals:

Extract: The LLM analyzes anonymous posts to distill identity-relevant features—profession, location, hobbies, technical background, and subtle writing patterns. Unlike traditional stylometry, which requires predefined feature schemas, the LLM automatically identifies what matters.

Search: Semantic embeddings scan databases containing millions of candidate profiles, identifying potential matches based on extracted characteristics.

Reason: A more powerful LLM performs pairwise comparisons between candidates, checking for consistency, contradictions, and corroborating evidence. This reasoning step boosted recall from 4.4% to 45.1% in controlled experiments.

Calibrate: Confidence scoring filters false positives, with the system achieving 99% precision in limited tests when calibrated appropriately.

The system demonstrated remarkable robustness, successfully matching users across different Reddit communities and even handling temporal drift—linking posts from the same user written years apart.

Impact Assessment

The implications are staggering. In a real-world test, researchers used their LLM agent to analyze anonymized interview transcripts from Anthropic's Interviewer dataset. The system successfully re-identified 9 out of 125 participants—approximately 7.2% of the anonymized pool.

Separate research from Northeastern University demonstrated similar capabilities, with a professor using an LLM to deanonymize 6 out of 24 scientist interviews in a single day—25% of the sample.

The researchers also tested scenarios where the true match existed among 10,000 candidates. Even under these challenging conditions, the system maintained 9% recall at 90% precision—far exceeding random chance.

Who's affected? Essentially anyone with a pseudonymous online presence linked to their real identity elsewhere. Whistleblowers, activists, victims of abuse, professionals discussing sensitive topics, and anyone relying on anonymity for safety or professional reasons now face significantly elevated risks.

What You Should Do

While perfect protection against LLM-assisted deanonymization may be impossible, you can reduce your risk profile:

Compartmentalize ruthlessly: Use completely separate identities for different contexts. Never cross-reference usernames, email addresses, or identifying details across accounts you wish to keep separate.

Vary your writing deliberately: Stylometry thrives on consistency. Deliberately alter sentence structure, vocabulary, punctuation habits, and tone across different personas. Consider using paraphrasing tools for sensitive communications.

Minimize biographical leakage: Avoid mentioning specific details—employers, locations, unique experiences—that appear in your public profiles. The LLM excels at connecting these dots.

Assume linkage is possible: The era of "practical obscurity"—where anonymity was protected by the difficulty of investigation—is over. Model your threat assumptions accordingly.

Consider technical countermeasures: Tools like Tor Browser, VPNs, and compartmentalized devices can prevent IP-based correlation, though they don't address stylometric analysis directly.

Lessons Learned

This research represents a fundamental shift in the privacy landscape. As Simon Lermen noted, "We demonstrate that large language models (LLMs) fundamentally change this calculus, enabling fully automated deanonymization attacks."

The key insight: anonymity's primary protection was never technical—it was economic. Manual investigation was expensive and time-consuming. LLMs have collapsed that barrier, automating what previously required expert human analysts.

Organizations collecting "anonymized" user data must reconsider their privacy guarantees. Academic researchers, journalists, and companies handling sensitive user information can no longer assume pseudonymization provides meaningful protection.

The research also highlights an uncomfortable truth: as AI capabilities advance, defensive measures must evolve in parallel. The gap between what's technically possible and what's practically accessible has closed dramatically—and our privacy models need urgent updating.

Resources