Grzegorz (Greg) Głuch

Simons Institute for the Theory of Computing, Melvin Calvin Laboratory, Room 318 · Berkeley · CA 94720

I am a Simons-Berkeley Postdoctoral Researcher hosted by Shafi Goldwasser currently at MIT. I completed my PhD at EPFL, where I was advised by a duo: Rudiger Urbanke and Michael Kapralov. Before that I studied at University of Wroclaw working with Jerzy Marcinkowski.

I work on the theoretical aspects of AI, focusing on AI safety and resilience. I am a member of the Resilience Research Pod at Simons. My research interests also include quantum complexity theory and its connections to physics.

Trying to make an impact.

Contact: grzegorzgluch93@gmail.com


Talks and Media

Dec 10, 2025

Quanta Magazine

“On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment” was featured in a Quanta Magazine article.

Read the article

London · Oct 30, 2025

UK AISI Alignment Conference

Invited talk

Reasoning Provably Improves Safety for Generation

Oct 21, 2025

New York University

Crypto and Security Seminar

What Can Cryptography Tell Us About AI Safety?

Oct 14, 2025

Massachusetts Institute of Technology

ML + Crypto Seminar

What Can Cryptography Tell Us About AI Safety?


Publications

Authors appear in alphabetical order (a convention in theoretical computer science).

Private Proofs of When and Where

Preprint
Abstract

Position verification schemes are interactive protocols where entities prove their physical location to others; this enables interactive proofs for statements of the form "I am at a location L." Although secure position verification cannot be achieved with classical protocols (even with computational assumptions), they are feasible with quantum protocols. In this paper we introduce the notion of zero-knowledge position verification, which generalizes position verification in two ways: 1. enabling entities to prove more sophisticated statements about their locations at different times (for example, "I was NOT near location L at noon yesterday"). 2. maintaining privacy for any other detail about their true location besides the statement they are proving. We construct zero-knowledge position verification from standard position verification and post-quantum one-way functions. The central tool in our construction is a primitive we call position commitments, which allow entities to privately commit to their physical position in a particular moment, which is then revealed at some later time.

A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning

Preprint
Abstract

In this paper, we initiate a cryptographically inspired theoretical study of detection versus mitigation of adversarial inputs produced by attackers of Machine Learning algorithms during inference time. We formally define defense by detection (DbD) and defense by mitigation (DbM). Our definitions come in the form of a 3-round protocol between two resource-bounded parties: a trainer/defender and an attacker. The attacker aims to produce inference-time inputs that fool the training algorithm. We define correctness, completeness, and soundness properties to capture successful defense at inference time while not degrading (too much) the performance of the algorithm on inputs from the training distribution. We first show that achieving DbD and achieving DbM are equivalent for ML classification tasks. Surprisingly, this is not the case for ML generative learning tasks, where there are many possible correct outputs that can be generated for each input. We show a separation between DbD and DbM by exhibiting a generative learning task for which is possible to defend by mitigation but is provably impossible to defend by detection under the assumption that the Identity-Based Fully Homomorphic Encryption (IB-FHE), publicly-verifiable zero-knowledge Succinct Non-Interactive Arguments of Knowledge (zk-SNARK) and Strongly Unforgeable Signatures exist. The mitigation phase uses significantly fewer samples than the initial training algorithm.

On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment

ICLR 2026
Abstract

With the increased deployment of large language models (LLMs), one concern is their potential misuse for generating harmful content. Our work studies the alignment challenge, with a focus on filters to prevent the generation of unsafe information. Two natural points of intervention are the filtering of the input prompt before it reaches the model, and filtering the output after generation. Our main results demonstrate computational challenges in filtering both prompts and outputs. First, we show that there exist LLMs for which there are no efficient prompt filters: adversarial prompts that elicit harmful behavior can be easily constructed, which are computationally indistinguishable from benign prompts for any efficient filter. Our second main result identifies a natural setting in which output filtering is computationally intractable. All of our separation results are under cryptographic hardness assumptions. In addition to these core findings, we also formalize and study relaxed mitigation approaches, demonstrating further computational barriers. We conclude that safety cannot be achieved by designing filters external to the LLM internals (architecture and weights); in particular, black-box access to the LLM will not suffice. Based on our technical results, we argue that an aligned AI system's intelligence cannot be separated from its judgment.

The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses

NeurIPS 2025 ICML 2024 Workshop
Abstract

We formalize and analyze the trade-off between backdoor-based watermarks and adversarial defenses, framing it as an interactive protocol between a verifier and a prover. While previous works have primarily focused on this trade-off, our analysis extends it by identifying transferable attacks as a third, counterintuitive but necessary option. Our main result shows that for all learning tasks, at least one of the three exists: a watermark, an adversarial defense, or a transferable attack. By transferable attack, we refer to an efficient algorithm that generates queries indistinguishable from the data distribution and capable of fooling all efficient defenders. Using cryptographic techniques, specifically fully homomorphic encryption, we construct a transferable attack and prove its necessity in this trade-off. Furthermore, we show that any task that satisfies our notion of a transferable attack implies a cryptographic primitive, thus requiring the underlying task to be computationally complex. Finally, we show that tasks of bounded VC-dimension allow adversarial defenses against all attackers, while a subclass allows watermarks secure against fast adversaries.

4/3 Rectangle Tiling lower bound

IPL 2025
Abstract

Given an n×n array of positive numbers, find a tiling using at most p rectangles minimizing the maximum rectangle weight (sum of covered entries). We prove it is NP-hard to approximate this problem within a factor of 4/3.

Nonlocality under Computational Assumptions

STOC 2024 QIP 2025
Abstract

Nonlocality and its connections to entanglement are fundamental features of quantum mechanics that have found numerous applications in quantum information science. A set of correlations is said to be nonlocal if it cannot be reproduced by spacelike-separated parties sharing randomness and performing local operations. An important practical consideration is that the runtime of the parties has to be shorter than the time it takes light to travel between them. One way to model this restriction is to assume that the parties are computationally bounded. We therefore initiate the study of nonlocality under computational assumptions and derive the following results: (a) We define the set 𝖭𝖾𝖫 (not-efficiently-local) as consisting of all bipartite states whose correlations arising from local measurements cannot be reproduced with shared randomness and polynomial-time local operations. (b) Under the assumption that the Learning With Errors problem cannot be solved in quantum polynomial-time, we show that 𝖭𝖾𝖫=𝖤𝖭𝖳, where 𝖤𝖭𝖳 is the set of all bipartite entangled states (pure and mixed). This is in contrast to the standard notion of nonlocality where it is known that some entangled states, e.g. Werner states, are local. (c) We prove that if 𝖭𝖾𝖫=𝖤𝖭𝖳 unconditionally, then 𝙱𝚀𝙿≠𝙿𝙿. (d) Using (c), we show that a certain natural class of 1-round delegated quantum computation protocols that are sound against 𝙿𝙿 provers cannot exist.

Bayes Complexity of Learners vs Overfitting

Under Review
Abstract

We introduce a new notion of complexity of functions and we show that it has the following properties: (i) it governs a PAC Bayes-like generalization bound, (ii) for neural networks it relates to natural notions of complexity of functions (such as the variation), and (iii) it explains the generalization gap between neural networks and linear schemes. Moreover, our notion naturally generalizes to neural networks with several layers. Even though the computation of our complexity is nontrivial in general, an upper-bound is often easy to derive, even for higher number of layers and functions with structure, such as period functions. An upper-bound we derive allows to show a separation in the number of samples needed for good generalization between 2 and 4-layer neural networks for periodic functions.

Breaking a Classical Barrier for Classifying Arbitrary Test Examples in the Quantum Model

AISTATS 2023 QIP 2022 Poster
Informal Abstract

This paper presents a surprising connection between recent advances in delegation of quantum computation to adversarial robustness. Imagine an adversary A that is in between a classifier C and a source S producing samples to be classified. Can we design an interactive protocol between C and the A, in which A would convince C that the samples being sent for classification really came from S? Using ideas from delegation of quantum computation we show it is possible to do it if A receives quantum states from S and A communicates with C classically. Why do we need quantum mechanics? Assume that S is uniform on {0,1}^n. How can A convince C that the samples he sent really came from S and were not handcrafted by S (if A received classical samples from S)? This seems to us be an almost impossible problem to solve, as we discuss in the paper.

Noether: The more things change, the more stay the same

Under Review
Informal Abstract

The field of theoretical machine learning managed to produce some initial explanations for why neural networks perform so well in practice. Unfortunately when reading these papers it is hard to see a common theme or a unifying message. In this paper we show how to look at many of the recent advances through one lens, i.e. the lens of symmetries. We show how this concept explains many previous results and how it helps to obtain new ones. To do all of that we study a connection to Noether's theorem from physics.

Exponential Separation between Two Learning Models and Adversarial Robustness

NeurIPS 2021 Spotlight
Informal Abstract

There are two sides of this paper. The technical one and potential applications to adversarial robustness. On the technical side we show exponential separation between two models: the standard PAC-learning model and an Equivalence-Query model. Thinking of adversarial examples as counterexamples, our result can be viewed as theoretical confirmation of findings that structured adversarial examples can speed up learning. We also discuss implications for adversarial robustness beyond norm-bounded threat models.

Query Complexity of Adversarial Attacks

ICML 2021
Informal Abstract

How hard is it to adversarially attack a network with only black-box access? We analyze a spectrum of threat models indexed by query access and give lower bounds on the number of queries needed to match white-box attacks, in terms of entropy of decision boundaries. We also analyze classical learning algorithms on synthetic tasks and prove meaningful security guarantees.

Spectral Clustering Oracles in Sublinear Time

SODA 2021
Abstract

Given a graph G that can be partitioned into k disjoint expanders, we construct a small-space data structure (a spectral clustering oracle) that allows quickly classifying vertices according to their cluster. Our main result gives sublinear query time with strong per-cluster guarantees, using dot-product access to a spectral embedding via estimates of short random walks and a carefully analyzed linear transformation.

Constructing a provably adversarially-robust classifier from a high accuracy one

AISTATS 2020
Abstract

Given black-box access to a high-accuracy classifier f, we show how to construct a classifier g that remains accurate and is provably robust to adversarial L2 perturbations. Our approach builds on randomized smoothing and uses geometric tools like random partitions and doubling dimension to bound adversarial error in terms of optimal error.

Pre-PhD work

The first order truth behind undecidability of regular path queries determinacy

ICDT 2019
Abstract

We show that determinacy remains undecidable even without regularity assumptions: it is undecidable for finite unions of conjunctive path queries, strengthening prior undecidability results for Regular Path Queries.

Can One Escape Red Chains? Regular Path Queries Determinacy is Undecidable

LICS 2018
Abstract

We solve a long-standing open problem by proving that Query Determinacy is undecidable for Regular Path Queries, a foundational query language for graph databases.


Teaching

Videos from a course on "Quantum Interactive Proofs" I held at EPFL can be found here


Education

École Polytechnique Fédérale de Lausanne (EPFL)

Doctor of Philosophy
Computer Science
2018 - 2024

University of Wroclaw

Master of Science
Computer Science
2015 - 2018

University of Wroclaw

Bachelor of Science
Mathematics
2012 - 2015

University of Wroclaw

Bachelor of Science
Computer Science
2012 - 2015