Adrian de Wynter

I am a principal scientist at Microsoft and a researcher (PGR) at the University of York. I work in projects related to fundamental problems in AI and science, such as reasoning, measurement, and social impact.

My primary research interest is reasoning as it relates to language in humans and machines. Lately I have focused on LLM reasoning capabilities (e.g. in measurement and strategic reasoning). I also raised here the question of whether understanding (as measured Wittgenstein-style) is required for 'good' dialogue; and demonstrated that in-context learning is a weak form of learning.

I favour pragmatic/intuitionistic approaches, where proofs of complexity and convergence must be constructive. Since I work in industry, these solutions typically apply to production problems. For example, we used category theory to prove that some prompting strategies are objectively better than others; and that they produce more preferrable outcomes by users (it ended up being part of Copilot!). I also recently wrote an algorithm with cryptographic guarantees for determining trust in LLMs-as-judges.

In earlier work I showed that finding a globally optimal solution to model compression is undecidable; but proved that polytime approximation algorithms exist. I applied these results to BERT, reaching a (then) SOTA on model compression. This was later adapted on quantum circuit optimisation in work at ORNL. I also proved (bridging learning theory and topological data analysis) how (and when) LLM-based data augmentation works.

My other research interests include recreational mathematics (games), preserving endangered languages, and computational social science. In the latter I have worked on mitigating toxicity, unfairness, and other harms of LLMs; personalisation and sycophancy in LLMs; research on LLM research; and did one of the first studies of the impact of ChatGPT on loneliness. And I also publish in SIGBOVIK, because this job is actually fun.

In terms of service, I have--like everyone and their grandma--reviewed for AAAI, *ACL, NeurIPS, ICLR and so on. I also review for Nature (AI & Ethics; Artificial Intelligence, and Communications), IEEE Transactions on Games, and ACM TIST.

If you are stalking me, here's my Google Scholar and LinkedIn. I was recently named a Thinking About Thinking fellow for 2026, if that means anything to you (it does to me :)). Media coverage of my work is below.

Last updated: Mar '26.

New and Selected Works

Following Larry Wasserman's essay, I invite comments on the papers below. Feel free to email me.
For a longer, complete list of works see here.
For how to handle my last name's weird spelling rules, see here.

The Hrunting of AI: Where and How to Improve English Dialectal Fairness

[pdf] [BibTex] [Code]

Wei Li and Adrian de Wynter

Preprint (2026)

Causal Reasoning Favors Encoders: On The Limits of Decoder-Only Models

[pdf] [BibTex]

Amartya Roy, Elamparithy M, Kripabandhu Ghosh, Ponnurangam Kumaraguru, and Adrian de Wynter

Preprint (2025)

On Interpreting Measurement with LLMs

[slides]

Adrian de Wynter

Invited talk at MBZUAI's research showcase (2025)

Is In-Context Learning Learning?

[pdf] [BibTex] [Code] [Press coverage]

Adrian de Wynter

ICLR (2026). Chosen by the Turing Post as one of 2025's AI papers you must read.

Evaluating Style-Personalized Text Generation: Challenges and Directions

[pdf]

Anubhav Jangra, Bahareh Sarrafzadeh, Silviu Cucerzan, Adrian de Wynter*, and Sujay Kumar Jauhar*

Preprint (2025)

The Thin Line Between Comprehension and Persuasion in LLMs

[pdf] [BibTex] [Code] [Press coverage]

Adrian de Wynter and Tangming Yuan

Preprint (2025)

Algorithmically Establishing Trust in Evaluators

[pdf] [BibTex] [Code] [Press coverage]

Adrian de Wynter

Preprint (2025)

Does using LLMs in daily life help or hinder learning a second language?

[pdf] [BibTex]

Wei Li, Andy Zhao, Adrian de Wynter, Si-Qing Chen, Paul Karimov, Paul and Joshua K. Hartshorne

Proceedings of the Annual Meeting of the Cognitive Science Society (2025)

A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications

[pdf] [BibTex] [Code]

Sunayana Sitaram, Adrian de Wynter, Isobel McCrum, Qilong Gu, and Si-Qing Chen

EMNLP 2025 Main

If Eleanor Rigby Had Met ChatGPT: A Study on Loneliness in a Post-LLM World

[pdf] [BibTex] [Code]

Adrian de Wynter

ACL 2025 Main

Awes, Laws and Flaws of Today's LLM Research

[pdf] [BibTex] [Code]

Adrian de Wynter

ACL 2025 Findings

LLMs Are All You Need

[pdf]

Adrian de Wynter

SIGBOVIK 2025 (IKYK)

Will GPT-4 Run DOOM?

[pdf] [BibTex] [Code] [Post] [Press coverage]

Adrian de Wynter

IEEE Transactions on Games (2024).

LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models

[pdf] [BibTex]

Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan and Furu Wei

COLM 2024

Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks

[pdf] [BibTex]

Fangru Lin, Shaoguang Mao, Emanuele La Malfa, Valentin Hofmann, Adrian de Wynter, Xun Wang, Si-Qing Chen, Michael J. Wooldridge, Janet B. Pierrehumbert, Furu Wei

ACL 2025 Main

RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

[pdf] [BibTex] [Code]

Adrian de Wynter et al.

AAAI 2025

On Meta-Prompting

[pdf] [BibTex] [Code]

Adrian de Wynter, Xun Wang, Qilong Gu, and Si-Qing Chen

Preprint

An Evaluation of LLM Outputs: Discourse and Memorization

[pdf] [BibTex]

Adrian de Wynter, Xun Wang, Alex Sokolov, Qilong Gu, and Si-Qing Chen

The Natural Language Processing Journal

"I'd Like to Have an Argument, Please": Argumentative Reasoning in Large Language Models

[pdf] [BibTex] [Code]

Adrian de Wynter and Tangming Yuan

COMMA 2024

Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

[pdf] [BibTex]

Rishav Hada, Varun Gumma, Adrian de Wynter, Harshita Diddee, Mohamed Ahmed, Monojit Choudhury, Kalika Bali, and Sunayana Sitaram

EACL 2024

Turing Completeness and Sid Meier's Civilization

[pdf] [BibTex] [The Turing Machine in Action]

Adrian de Wynter

IEEE Transactions on Games

An Algorithm for Learning Smaller Representations of Models With Scarce Data

[pdf] [BibTex] [Code]

Adrian de Wynter

Information Geometry (2024)

TL;DRs of Some Papers

I've found it useful to have a series of posts about some of my works. This makes them more accessible and allows me to share my passion for mathematics. I definitely do not proofread these.
I'm absolutely terrible at updating this site (record: 2 years), so bear with me.

The No-Data Algorithm

How to enable trust in LLMs-as-judges WITHOUT labelled data! With proofs!

Will GPT-4 Run DOOM?

Yes but no. Links to code, resources, TL;DR of the paper, and videos of the model playing the game.

Turing Completeness and Sid Meier's Civilization

Building a literal computer inside Civ

Neural architecture search is undecidable

A summary of my paper 'On The Bounds of Function Approximations.'

Bort

(Provably) optimal model compression with algebraic topology

Selected Media Coverage

Some media coverage of the work I do, in case my posts remain as confusing as the original papers.

The Turing Post selected this paper as one of the 23 papers from 2025 that indicate where is AI headed.

Posts covering my paper 'Is In-Context Learning Learning?'. Other coverage is here and here. I like this one, too: LLM in-context learning (ICL) is learning, but not how you think

Microsoft’s No-Data Algorithm enables trust without labels

Brief coverage (with really cool implications I hadn't thought about) on the No-Data Algorithm.

LLMsにおける理解と説得の微妙な境界（The Thin Line Between Comprehension and Persuasion in LLMs）

A post talking about our paper 'The Thin Line Between Comprehension and Persuasion in LLMs' (in Japanese, but it isn't like Google Translate doesn't exist). Other coverage is here (Quantum Zeitgeist).

Microsoft scientist gets AI to play DOOM but then issued a warning

Some of the coverage of the work I did with DOOM and GPT-4. You can also read about it here (Tom's Hardware), here (PC Mag), and here (The Register).

A version of the BERT language model that’s 20 times as fast

Another post edited by Larry Hardesty. This one talks about Bort.

State of the Art in Automated Machine Learning

This is an interview I, along with other researchers, gave for InfoQ around AutoML. It's so interesting to see people of such different backgrounds arriving to the same conclusions :)

Alexa Research Paper Shows Genetic Algorithms Offer Best Solution for Neural Network Optimization

This post sums up very nicely my work around NAS/ASP/FA. Other coverage is here (Venturebeat) and here (Amazon Science).

Contact: first-initial-full-last-name-including-tussenvoegsel (at) microsoft.com

Factoid: my ORCID (326797241) is a prime number; it is expressible as the sum of two squares (1715 and 17996); and it is the square root (hypothenuse) of the sum of two squares (61726280 and 320914791). Yay.