Adrian de Wynter

I am a principal applied scientist at Microsoft and a researcher (PGR) at the University of York. I work in projects related to natural language understanding/generation and fundamental problems in AI, such as reasoning and formal modelling of dialogue, like LLMs.

At Microsoft my work involves leading, designing, and deploying Word- and Office- AI features and research. These deal with composition (what you see when you type in Word), multilinguality (e.g., expanding products to new markets), measurement (reasoning, automated evaluation), personalisation, and other workstreams. Yes, I also work on buzzwords like 'agentic workflows'. Most of the work here, you can see it in Word Copilot.

My primary research interest is reasoning as it relates to language in humans and machines. Lately I have focused on LLM-based reasoning capabilities (e.g. here, here, and here). My theoretical work is intuitionistic: algorithms have guarantees of complexity and convergence via constructive proofs, and must relate to a realistic (e.g. production) scenario. This gives meaningful answers about complex problems.

For example, we used category theory to prove that some prompting strategies are objectively better than others; and that they would produce more preferrable outcomes by users (and ended up being a product in Word). I also recently wrote an algorithm with cryptographic guarantees for determining trust in LLMs-as-judges.

In earlier work I showed that finding a globally optimal solution to model compression is undecidable, but proved that polynomial-time approximation algorithms exist--and applied these results to BERT and reaching a (then) state-of-the-art on model compression. This last contribution was later adapted for quantum circuit optimisation in work at ORNL. I also showed (bridging learning theory and topological data analysis) how (and when) LLM-based data augmentation works.

My other research interests relate to recreational mathematics (games), preserving endangered languages, and computational social science. In the latter I have worked on mitigating toxicity, unfairness, and other harms of LLMs; research on LLM research; and worked on the first study of the impact of ChatGPT on loneliness. And I also publish in SIGBOVIK, because this job is actually fun.

I have served as reviewer for AAAI, EMNLP, and so on. I also review for Nature (Communications, Artificial Intelligence), IEEE Transactions on Games, ACM TIST; and I am now on Twitter.

Last updated: August '25.

TL;DRs of Some Papers

I've found it useful to have a series of posts about some of my works. This makes them more accessible and allows me to share my passion for mathematics. I definitely do not proofread these.
I'm absolutely terrible at updating this site (record: 2 years), so bear with me.

The No-Data Algorithm

How to enable trust in LLMs-as-judges WITHOUT labelled data! With proofs!

Will GPT-4 Run DOOM?

Yes but no. Links to code, resources, TL;DR of the paper, and videos of the model playing the game.

Turing Completeness and Sid Meier's Civilization

Building a literal computer inside Civ

Neural architecture search is undecidable

A summary of my poorly-titled, ever-misinterpreted paper 'On The Bounds of Function Approximations.'

Bort

(Provably) optimal model compression with algebraic topology

New and Selected Works

Following Larry Wasserman's essay, I invite comments on the papers below. Feel free to email me.
For a longer, complete list of works see here.
For how to handle my last name's weird spelling rules, see here.

Evaluating Style-Personalized Text Generation: Challenges and Directions

[pdf]

Anubhav Jangra, Bahareh Sarrafzadeh, Adrian de Wynter, Silviu Cucerzan, and Sujay Kumar Jauhar

Forthcoming

The Thin Line Between Comprehension and Persuasion in LLMs

[pdf] [BibTex] [Code]

Adrian de Wynter and Tangming Yuan

Preprint

Labelling Data with Unknown References

[pdf] [BibTex]

Adrian de Wynter

Preprint

A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications

[pdf] [BibTex]

Sunayana Sitaram, Adrian de Wynter, Isobel McCrum, Qilong Gu, and Si-Qing Chen

Preprint

If Eleanor Rigby Had Met ChatGPT: A Study on Loneliness in a Post-LLM World

[pdf] [BibTex] [Code]

Adrian de Wynter

ACL 2025 Main

Awes, Laws and Flaws of Today's LLM Research

[pdf] [BibTex] [Code]

Adrian de Wynter

ACL 2025 Findings

LLMs Are All You Need

[pdf]

Adrian de Wynter

SIGBOVIK 2025 (IKYK)

Will GPT-4 Run DOOM?

[pdf] [BibTex] [Code] [Post]

Adrian de Wynter

IEEE Transactions on Games (2024)

LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models

[pdf] [BibTex]

Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan and Furu Wei

COLM 2024

Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks

[pdf] [BibTex]

Fangru Lin, Shaoguang Mao, Emanuele La Malfa, Valentin Hofmann, Adrian de Wynter, Xun Wang, Si-Qing Chen, Michael J. Wooldridge, Janet B. Pierrehumbert, Furu Wei

ACL 2025 Main

RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

[pdf] [BibTex] [Code]

Adrian de Wynter et al.

AAAI 2025

On Meta-Prompting

[pdf] [BibTex] [Code]

Adrian de Wynter, Xun Wang, Qilong Gu, and Si-Qing Chen

Preprint (2023)

An Evaluation of LLM Outputs: Discourse and Memorization

[pdf] [BibTex]

Adrian de Wynter, Xun Wang, Alex Sokolov, Qilong Gu, and Si-Qing Chen

The Natural Language Processing Journal

"I'd Like to Have an Argument, Please": Argumentative Reasoning in Large Language Models

[pdf] [BibTex] [Code]

Adrian de Wynter and Tangming Yuan

COMMA 2024

On the Opportunities and Dangers of LLM-Based Evaluation

Chris Quirk and Adrian de Wynter

Invited talk at the 2023 MLADS Conference

Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

[pdf] [BibTex]

Rishav Hada, Varun Gumma, Adrian de Wynter, Harshita Diddee, Mohamed Ahmed, Monojit Choudhury, Kalika Bali, and Sunayana Sitaram

EACL 2024

Turing Completeness and Sid Meier's Civilization

[pdf] [BibTex] [The Turing Machine in Action]

Adrian de Wynter

IEEE Transactions on Games

An Algorithm for Learning Smaller Representations of Models With Scarce Data

[pdf] [BibTex] [Code]

Adrian de Wynter

Information Geometry (2024)

Selected Media Coverage

Some media coverage of the work I do, in case my posts remain as confusing as the original papers.

Microsoft scientist gets AI to play DOOM but then issued a warning

Some of the coverage of the work I did with DOOM and GPT-4. You can also read about it here (Tom's Hardware), here (PC Mag), and here (The Register).

A version of the BERT language model that’s 20 times as fast

Another post edited by Larry Hardesty. This one talks about Bort.

State of the Art in Automated Machine Learning

This is an interview I, along with other researchers, gave for InfoQ around AutoML. It's so interesting to see people of such different backgrounds arriving to the same conclusions :)

Alexa Research Paper Shows Genetic Algorithms Offer Best Solution for Neural Network Optimization

This post sums up very nicely my work around NAS/ASP/FA.

Amazon researchers say evolutionary approach improves the selection of AI models

From Venturebeat.

How to Construct the Optimal Neural Architecture for Your Machine Learning Task

A post edited by the awesome Larry Hardesty.

Contact: first-initial-full-last-name-including-tussenvoegsel (at) microsoft.com

Factoid: my ORCID (326797241) is a prime number; it is expressible as the sum of two squares (1715 and 17996); and it is the square root (hypothenuse) of the sum of two squares (61726280 and 320914791). Yay.