Advances in Authorship Analysis and Adversarial Stylometry
Authorship Analysis (AAn) is a Natural Language Processing (NLP) task that aims to infer characteristics of the author of a linguistic text. These characteristics may include the author’s identity as well as biographical and sociolinguistic information, such as age, gender, native language, and political orientation. AAn has important applications in areas such as cultural heritage, forensic linguistics, and cybersecurity. In the latter domain, it can be used to detect, discourage, or trace criminal activities including phishing, cyberbullying, and identity theft.
A large body of research has focused on applying AAn techniques to online communication, including emails, blogs, social media posts, and tweets. In particular, authorship analysis can support the monitoring of harmful or illegal content shared on social media platforms and help identify posts that violate platform policies or legal regulations.
At the same time, AAn raises significant privacy concerns. Its ability to de-anonymize authors or link pseudonymous identities may endanger individuals such as whistleblowers, journalists, or political activists. As a result, increasing attention has been devoted to methods for intentionally modifying writing style in order to conceal authorial identity and personal characteristics. This task, commonly referred to as adversarial stylometry or authorship obfuscation, seeks to reduce the effectiveness of stylometric analysis. This talk provides an overview of the field, its main applications and recent developments.