In a world of digital communication, forensic handwriting analysis for criminal investigation is almost a lost and virtually unnecessary art. However, new tools that can analyze the writing style of a purportedly anonymous email or other electronic missive could be coupled with other digital evidence such as device used to create the communication, internet address (IP) etc to pin down a suspect to a particular note in fraud, terrorism, cyber bullying or other crime.
Emad Abdallah of Hashemite University, in Zarqa, Jordan, and colleagues have developed an algorithm that can rapidly min anonymous email content and find incriminating similarities between a cyber threat, blackmailing email or other malicious missive and earlier emails from known suspects. The team has tested their algorithm extensively and finds that it can identify individual authors from a very limited number of features given a large enough stock of earlier emails. The new approach side-steps the rather stifling prerequisite that messages in the database from each suspect must be several thousand words long as was the case with earlier de-anonymizing algorithms. The technology could be applied equally to identifying the source of spam, malware, or fraudulent messages and other problematic communications.
The algorithm focuses on a small number of features of the writing style: vocabulary used (spelling errors), grammatical style and errors, specific identifying content, structural characteristics and idiosyncrasies. The algorithm can home in on whether the author uses the first or third person, whether they adhere to polite or neutral standards of etiquette, whether the context is positive or negative and how emotions play out in the communication. It is also possible to combine this assessment with more conventional checks of writing level, such as the Flesch, Dale-Chall, Gunning fog formulas etc.
The algorithm is “trained” to recognize the authors of known emails with a large data set of their messages and then tested against an anonymous message. The team demonstrated an accuracy of 80-90% for four “suspects” in mock investigations even when the number of possible senders is as large as fifty. Even with just five training emails for one suspect, accuracy was just as encouraging. This level of accuracy coupled with other circumstantial as well as digital and physical evidence might be sufficient for a successful prosecution where its absence might lead to a failed case.
“The results clearly showed the ability of identifying the authors with very limited number of features,” the team reports. They are now planning to analyze the relationship between the numbers of features used in the extraction process, optimal two word phrases, and modifying the learning engine to further improve the classification performance in the context of email forensics.
Abdallah E.E., Abdallah A.E., Bsoul M., Otoom A.F. & Daoud E.A. (2013). Simplified features for email authorship identification, International Journal of Security and Networks, 8 (2) 72. DOI: 10.1504/IJSN.2013.055941