That's similar to the programs used by lawyers to detect plagiarism for their clients, and the programs used by professors and teachers to detect plagiarism in their students. That process is called textual analysis or literary analysis (if it's about literary exemplars).craigr wrote:
De-anonymizing is not fiction! I had considered writing a tool to do this years back but decided it could only be used for bad purposes and dropped it. I have seen early versions of tools that did this later however. They will use word patterns, sentence structure, typos, etc. to build a confidence factor of a person being the same across multiple text sets. I am sure the technology must be far more advanced now.
The idea is that writers will maintain the same sentence structure, language patterns, etc. across a variety of different books, articles, etc. In addition to computers analyzing words, it also includes handwriting analysis. A Vassar English professor and forensic linguist, Donald Wayne Foster, used it to determine that Joe Klein was the author of the book, Primary Colors, written after Bill Clinton's second term. Joe Klein eventually came forward to admit it was his work.
But forensic linguists have gotten into trouble, including Foster, who was sued following his work on the 2001 anthrax case. He fingered Steven Hatfill as the likely culprit, which turned out to not be true.