Hi, I’m Artyom ([ɐrˈtʲɵm], Артём) Karpov — an AI safety researcher and engineer.
I study how to monitor and control language model agents. My work focuses on two related problems: whether LLMs can secretly collude through steganography — hiding messages in seemingly normal text — and whether they can encode hidden reasoning in their chains of thought to evade monitors. Both problems undermine safe deployment of AI systems.
I co-authored papers accepted at AAAI, ICLR, NeurIPS, and other workshops, and my research has been funded by large nonprofit organizations. I’ve completed work for the UK AI Security Institute and participated in METR’s evaluation benchmarks. Before pivoting to AI safety in 2022, I spent 15+ years as a software engineer — building emergency response systems, contributing to .NET Core, and developing backend infrastructure serving millions of users.
I hold a degree in Applied Mathematics and have completed intensive ML programs including MATS, ARENA, MLSS, and the Apart Fellowship.
Notable Projects
AI Safety Research (2022–present)
- NEST: Nascent Encoded Steganographic Thoughts (2026) — Evaluation of frontier models’ capabilities for steganographic reasoning in chains of thought.
- The Steganographic Potentials of Language Models (2025) — First-authored paper on eliciting steganographic collusion in LLMs via reinforcement learning. Accepted at AAAI and ICLR workshops.
- Inducing Human-like Biases in Moral Reasoning Language Models (2024) — Fine-tuning BERT-based models on fMRI data. Accepted at a NeurIPS workshop. UK AI Security Institute — Backdoor Evaluations (2025) — Delivered LLM backdoor evaluations under a 3-week deadline during the AISI bounty programme. METR — Baseline Evaluations (2025) — Participated in evaluations for RE-Bench and SWE-Bench.
- Evaluating and Inducing Steganography in LLMs (2024) — Apart Deception Hackathon project on steganography via RL.
- CCS on Compound Sentences (2024) — Testing Contrast-Consistent Search for eliciting latent knowledge. Blog post.
Software Engineering (2005–2021)
- Contributor to .NET Core (2017) — Accelerated regular expressions 5.5x with a multithreaded LRU cache, merged into the framework used by millions of projects.
- ProWritingAid (2020–2021) — Backend development for a writing tool serving 1.5M+ users. Built image generation service that increased active subscribers by 146%.
- GameoEmergency (2011–2018) — Built a real-time emergency monitoring system from the ground up on Azure. Doubled the user base, won the SFR award in 2017.
- Dining Philosophers in .NET (2019) — Technical article on concurrency with TPL.
- 94 answers on StackOverflow — 3,571 reputation.
Philosophy (2018-present)
- My Blog on the History of Philosophy (Russian)
- Plato on the problem of knowledge (Russian)
- The trial of Socrates in Plato’s “Apology” (Russian)
- The Pythagoreans on the nature of number (Russian)
You can find me at:
PGP
1 | -----BEGIN PGP PUBLIC KEY BLOCK----- |