Adrian de Wynter



.

I am a principal applied scientist at Microsoft and a researcher (PGR) at the University of York. I work in projects related to natural language understanding/generation and fundamental problems in deep learning, such as reasoning and formal modelling of dialogue, like LLMs.

At Microsoft my work involves leading, designing, and deploying Word- and Office- AI features and research. These deal with composition (what you see when you type in Word), multilinguality (e.g., expanding products to new markets), measurement (reasoning, automated evaluation), personalisation, and other workstreams. Yes, I also work on buzzwords like 'agentic workflows'. Most of the work here, you can see it in Word Copilot.

My primary research interest is reasoning as it relates to language in humans and machines. Lately I have focused on LLM-based reasoning capabilities (e.g. here, here, and here). My theoretical work is intuitionistic: algorithms have guarantees of complexity and convergence via constructive proofs, and must relate to a realistic (e.g. production) scenario. This gives meaningful answers about complex problems.

For example, we used category theory to prove that some prompting strategies are objectively better than others; and that they would produce more preferrable outcomes by users (and ended up being a product in Word). I also recently wrote an algorithm with cryptographic guarantees for determining trust in LLMs-as-judges.

In earlier work I showed that finding a globally optimal solution to model compression is undecidable, but proved that polynomial-time approximation algorithms exist--and applied these results to BERT and reaching a (then) state-of-the-art on model compression. This last contribution was later adapted for quantum circuit optimisation in work at ORNL. I also showed (bridging learning theory and TDA) how (and when) LLM-based data augmentation works.

My other research interests relate to recreational mathematics (games), preserving endangered languages, and computational social science. In the latter one I have worked on mitigating toxicity and other harms of LLMs, research on LLM research, and the very first study of the impact of ChatGPT on loneliness.

Incidentally, I am now on Twitter.

Last updated: July '25.

I've found it useful to have a series of "posts" on the work I do, to make it more accessible and share my passion for mathematics, especially since I don't have any social media (does LinkedIn count?)
I'm absolutely terrible at updating this site (record: 2 years), so bear with me.

Links to code, resources, TL;DR of the paper, and videos of the model playing the game.
A brief note about my paper "Turing Completeness and Sid Meier's Civilization". We talk about how to execute arbitrary algorithms inside Civ, and what does that mean for this and other 4X games.
A post on how hard neural architecture search (NAS) and machine learning can be, from a computational perspective. It also discusses the workarounds and applications of this result, with a particular emphasis on why some NAS approaches do not do better than random search. This is a summary of my poorly-titled, ever-misinterpreted paper "On The Bounds of Function Approximations."
A post on the algorithms used to obtain Bort, an optimally compressed version of the BERT language model. This can be viewed as a summary of my papers "Optimal Subarchitecture Extraction for BERT", "An Algorithm for Learning Smaller Representations of Models With Scarce Data", and "An Approximation Algorithm for Optimal Subarchitecture Extraction", albeit less concise than these titles, if you can believe it.

Following Larry Wasserman's essay, I invite comments on the papers below. Feel free to email me.
For a longer, complete list of works see here.
For how to handle my last name's weird spelling rules, see here.

The Thin Line Between Comprehension and Persuasion in LLMs
Adrian de Wynter and Tangming Yuan
Preprint
Labelling Data with Unknown References
Adrian de Wynter
Preprint
A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications
Sunayana Sitaram, Adrian de Wynter, Isobel McCrum, Qilong Gu, and Si-Qing Chen
Preprint
If Eleanor Rigby Had Met ChatGPT: A Study on Loneliness in a Post-LLM World
Adrian de Wynter
Accepted to ACL 2025 Main
Awes, Laws and Flaws of Today's LLM Research
Adrian de Wynter
Accepted to ACL 2025 Findings
Will GPT-4 Run DOOM?
Adrian de Wynter
IEEE Transactions on Games (2024)
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan and Furu Wei
COLM 2024
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks
Fangru Lin, Shaoguang Mao, Emanuele La Malfa, Valentin Hofmann, Adrian de Wynter, Jing Yao, Si-Qing Chen, Michael Wooldridge, Furu Wei
Accepted to ACL 2025 Main
RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
Adrian de Wynter et al.
AAAI 2025
On Meta-Prompting
Adrian de Wynter, Xun Wang, Qilong Gu, and Si-Qing Chen
Preprint (2023)
An Evaluation of LLM Outputs: Discourse and Memorization
Adrian de Wynter, Xun Wang, Alex Sokolov, Qilong Gu, and Si-Qing Chen
The Natural Language Processing Journal
"I'd Like to Have an Argument, Please": Argumentative Reasoning in Large Language Models
Adrian de Wynter and Tangming Yuan
COMMA 2024
On the Opportunities and Dangers of LLM-Based Evaluation
Chris Quirk and Adrian de Wynter
Invited talk at the 2023 MLADS Conference
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Rishav Hada, Varun Gumma, Adrian de Wynter, Harshita Diddee, Mohamed Ahmed, Monojit Choudhury, Kalika Bali, and Sunayana Sitaram
EACL 2024
Turing Completeness and Sid Meier's Civilization
Adrian de Wynter
IEEE Transactions on Games
An Algorithm for Learning Smaller Representations of Models With Scarce Data
Adrian de Wynter
Information Geometry (2024)

Some media coverage of the work I do, in case my posts remain as confusing as the original papers.

Some of the coverage of the work I did with DOOM and GPT-4. You can also read about it here (Tom's Hardware), here (PC Mag), and here (The Register).
Another post edited by Larry Hardesty. This one talks about Bort.
This is an interview I, along with other researchers, gave for InfoQ around AutoML. It's so interesting to see people of such different backgrounds arriving to the same conclusions :)

Contact: first-initial-full-last-name-including-tussenvoegsel (at) microsoft.com

Factoid: my ORCID (326797241) is a prime number; it is expressible as the sum of two squares (1715 and 17996); and it is the square root (hypothenuse) of the sum of two squares (61726280 and 320914791). Yay.