Adrian de Wynter portrait

Adrian de Wynter

AI Scientist · Microsoft & The University of York

I am a principal scientist at Microsoft and a researcher (PGR) at the University of York. I work in projects related to fundamental problems in AI and science, such as cognition, measurement, and social impact.

Recent Work & News

May 2026Talk

Word Is Not Only Text: The Document Manipulation Benchmark

Invited talk at LREC Industry Day (2026; and panel!)

Slides
May 2026Talk

Will GPT-4 (and 5!) Run DOOM?

Invited talk at i3D 2026 on my paper Will GPT-4 Run DOOM?.

Slides (auto-downloads)
May 2026 Preprint

The Hrunting of AI: Where and How to Improve English Dialectal Fairness

With Wei Li. We show that improving dialectal performance in LLMs requires more powerful algorithmic techniques than simply stats.

PDF | Code | BibTex
May 2026 ACL (Findings) 2026

The Thin Line Between Comprehension and Persuasion in LLMs

With Tangming Yuan. We pose that LLMs have shallow, but not entirely void, understanding.

PDF | Code Bibtex | Press
May 2026 ACL GEM Workshop 2026

Evaluating Style-Personalized Text Generation: Challenges and Directions

With Anubhav Jangra, Bahareh Sarrafzadeh, Silviu Cucerzan, and Sujay Kumar Jauhar (co-PI). We find that personalisation measurements are highly unreliable between metrics and approaches.

PDF | Bibtex |
April 2026 ICLR 2026

Is In-Context Learning Learning?

It is, but very weak and brittle. Chosen by the Turing Post as one of 2025's AI papers you must read.

PDF | Code Bibtex | Press
March 2026 News

Thinking about Thinking Fellowship!

Excited to have been named a Thinking about Thinking fellow for 2026!

Read more
October 2025Talk

On Interpreting Measurement with LLMs

Invited talk at MBZUAI's research showcase (2025)

Slides
September 2025 Preprint

Algorithmically Establishing Trust in Evaluators

A cryptographically-secure way to perform LLM-based evaluations (and avoid cheating!)

PDF | Code | BibTex Press

Media Coverage

Extended Biography

Adrian de Wynter is a principal scientist at Microsoft and a researcher (PGR) at the University of York. His main work is related to fundamental problems in AI and science, such as cognition (e.g., understanding, reasoning) in machines, measurement (in science, like evals), and social impact.

His primary research interest is the interplay of intelligence and reasoning between humans and machines. This touches upon foundational aspects of AI, such as the question of whether showcasing understanding is required for 'good' dialogue (it's not); or whether LLMs are able to learn, as opposed to relying on their intrinsic knowledge (they can). He also studied the ability of LLMs/agents to interact with their environment (by using DOOM!). Measurement-wise, he developed an algorithm with cryptographic guarantees for determining trust in evaluators (e.g., LLMs-as-judges).

Earlier theoretical work showed with category theory that some prompting strategies are (formally) better than others, and how (and when) LLM-based data augmentation works. Even earlier research--because apparently he's getting old--worked on mathematical aspects of NAS, such as undecidability and approximation algorithms.

His other research interests include recreational mathematics (games), preserving endangered languages, and computational social science. In the latter he has worked on mitigating toxicity, unfairness, and other harms of LLMs; personalisation and sycophancy in LLMs; research on research (e.g., LLMs); and did one of the first studies of the impact of ChatGPT on loneliness. He also has stuff in SIGBOVIK.

In terms of service, he has--like everyone and their grandma--reviewed for AAAI, *ACL, NeurIPS, ICML, ICLR and so on (and was named a gold reviewer, which honestly is really nice). He also reviews for Nature (AI & Ethics; Artificial Intelligence, and Communications), IEEE Transactions on Games, and ACM TIST. He was recently named a Thinking About Thinking fellow.

In his spare time, Adrian enjoys photography, cooking, type 2 fun, and divulging personal information to strangers on the internet.

Curriculum Vitae

Download the full CV as a PDF for your records (I haven't uploaded it yet).

Download PDF
Note: I'll be filling this out when I have time

Service & Leadership

(Conference) Reviewer
AAAI *ACL, NeurIPS, ICML, ICLR
2022ish – Present. Named a gold reviewer, which honestly is really nice
(Journal) Reviewer
Nature (AI & Ethics; Artificial Intelligence, and Communications)
2024 – Present
(Other) (Journal) Reviewer
IEEE Transactions on Games, ACM TIST
2024 – Present

Research Statement

Neural network visualization

Vision & Overview

I haven't filled this out loloops

Publications

2026

3
Word Is Not Only Text: The Document Manipulation Benchmark
Adrian de Wynter
Invited talk at LREC Industry Day (2026; and panel!)
3
Will GPT-4 (and 5!) run DOOM?
Adrian de Wynter
Invited talk at the 2026 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games
4
Evaluating Style-Personalized Text Generation: Challenges and Directions
Anubhav Jangra, Bahareh Sarrafzadeh, Silviu Cucerzan, Adrian de Wynter*, and Sujay Kumar Jauhar*
ACL GEM Workshop 2026 (ann. 2025)
5
The Thin Line Between Comprehension and Persuasion in LLMs
Adrian de Wynter and Tangming Yuan
ACL 2026 (Findings; ann. 2025)
1
The Hrunting of AI: Where and How to Improve English Dialectal Fairness
Wei Li and Adrian de Wynter
Preprint (2026)

2025

2
Causal Reasoning Favors Encoders: On The Limits of Decoder-Only Models
Amartya Roy, Elamparithy M, Kripabandhu Ghosh, Ponnurangam Kumaraguru, and Adrian de Wynter
Preprint
3
On Interpreting Measurement with LLMs
Adrian de Wynter
Invited talk at MBZUAI's research showcase (2025)
6
Labelling Data with Unknown References
Adrian de Wynter
Preprint
7
Does using LLMs in daily life help or hinder learning a second language?
Wei Li, Andy Zhao, Adrian de Wynter, Si-Qing Chen, Paul Karimov, and Joshua K. Hartshorne
CogSci (2025)
8
A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications
Sunayana Sitaram, Adrian de Wynter, Isobel McCrum, Qilong Gu, and Si-Qing Chen
EMNLP 2025 Main
9
If Eleanor Rigby Had Met ChatGPT: A Study on Loneliness in a Post-LLM World
Adrian de Wynter
ACL 2025 Main (ann. 2024)
10
Awes, Laws and Flaws of Today's LLM Research
Adrian de Wynter
ACL 2025 Findings (ann. 2024)
11
Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks
Fangru Lin, Shaoguang Mao, Emanuele La Malfa, Valentin Hofmann, Adrian de Wynter, Xun Wang, Si-Qing Chen, Michael J. Wooldridge, Janet B. Pierrehumbert, Furu Wei
ACL 2025 Main (ann. 2024)
13
RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
Adrian de Wynter et al.
AAAI 2025 (ann. 2024)
14
LLMs Are All You Need
Adrian de Wynter
SIGBOVIK 2025 (IKYK)

2024

15
Will GPT-4 Run DOOM?
Adrian de Wynter
IEEE Transactions on Games (2024)
16
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan and Furu Wei
COLM 2024
17
"I'd Like to Have an Argument, Please": Argumentative Reasoning in Large Language Models
Adrian de Wynter and Tangming Yuan
COMMA 2024 (ann. 2023)
18
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Rishav Hada, Varun Gumma, Adrian de Wynter, Harshita Diddee, Mohamed Ahmed, Monojit Choudhury, Kalika Bali, and Sunayana Sitaram
EACL 2024 (ann. 2023)
19
An Algorithm for Learning Smaller Representations of Models With Scarce Data
Adrian de Wynter
Information Geometry (2024; ann. 2020)

2023

21
On Meta-Prompting
Adrian de Wynter, Xun Wang, Qilong Gu, and Si-Qing Chen
Preprint
22
An Evaluation of LLM Outputs: Discourse and Memorization
Adrian de Wynter, Xun Wang, Alex Sokolov, Qilong Gu, and Si-Qing Chen
The Natural Language Processing Journal
23
On the Opportunities and Dangers of LLM-Based Evaluation
Chris Quirk and Adrian de Wynter
Invited talk at the 2023 MLADS Conference
24
The Curse of the Biased Researcher: Common Pitfalls in LLM-based Evaluation
Adrian de Wynter
Invited talk at the 2023 MLADS Conference
25
A User-Centered Evaluation of Spanish Text Simplification
Adrian de Wynter, Anthony Hevia, and Si-Qing Chen
Preprint

Older

27
Turing Completeness and Sid Meier's Civilization
Adrian de Wynter
IEEE Transactions on Games (2022)
28
Bort: Algorithms and Applications
Adrian de Wynter
Invited talk at the 2021 Alexa Prize Summit
29
Optimal Subarchitecture Extraction for BERT
Adrian de Wynter and Daniel J. Perry
Preprint (2020)
30
An Approximation Algorithm for Optimal Subarchitecture Extraction
Adrian de Wynter
Preprint (2020)
31
Mischief: A Simple Black-Box Attack Against Transformer Architectures
Adrian de Wynter
Preprint (2020)
32
Harder Performance Measures for Language Models
Adrian de Wynter
Invited talk at the 2020 Alexa Prize Summit
33
On the Bounds of Function Approximations
Adrian de Wynter
ICANN 2019 (oral presentation)

Writings