Arithmetic Coding Steganography Using Frontier Models

Posted on 2025-08-09

I implemented arithmetic coding steganography [1] to hide messages in LLM-generated text. While the proof-of-concept works for messages on GPT-3.5, it revealed fundamental challenges with API non-determinism and interval collapse that limit practical applications. This post documents the technical approach, challenges encountered, and lessons learned.

The code: https://github.com/artkpv/arithmetic-coding-steganography/

Introduction

Can language models hide secret messages in seemingly innocent text? This question led me down a two-month rabbit hole exploring steganographic encoding using arithmetic coding on GPT-3.5. The project, part of my research into encoded reasoning capabilities in AI systems, aimed to demonstrate how models could theoretically perform covert reasoning while appearing to generate normal text.

The motivation came from policy and safety concerns: if AI systems can hide their reasoning processes, how can we ensure alignment and safety? This tool, Arithmetic Coding Steganography, serves as a step towards creating a proof-of-concept to demonstrate for policymakers and researchers the potential for encoded reasoning in language models.

Unlike the previous work [1], which used a local GPT-2 model to get the distribution, this tool implements the arithmetic coding on GPT-3.5 and potentially can be adapted to any model (see limitations).

What is Arithmetic Coding Steganography?

Arithmetic coding steganography hides messages by influencing which tokens a language model selects during text generation. Here’s how it works:

Get token probabilities: Query the LLM for the probability distribution of next tokens given the current context
Divide the probability space: Map the probability distribution to numerical intervals
Encode message bits: Select tokens based on which interval contains the encoded message value
Generate natural text: The selected tokens form grammatically correct, semantically meaningful text that secretly encodes the message

The key insight: by carefully selecting from the available tokens based on their probabilities, we can encode arbitrary binary data while maintaining natural-looking text generation.

An example. See the visualization above for the encoding that results from: stego-arith encode -a 'artifacts/' "The weather today is " "S", which produces encoded text: ‘sunny and warm, a perfect day for a picnic at the parkI’. The picture shows each step during the encoding. The red interval shows a next token chosen from available for the coninuation (divided intervals on a line). The rescaling (dotted lines between steps) happens when the algorithm has common left most bits (“bits=1 0” means 1 bit, namely “0”, was selected). The blue dot on each line represents the current hidden message bits mapped to an interger. The decoding is done with: stego-arith decode -a 'artifacts/' "The weather today is " "sunny and warm, a perfect day for a picnic at the parkI", which correctly outputs “S”. See the visualization for it below. This is the reverse of the encoding, i.e. it restores the hidden message mapped to an interger.

Implementation Journey

Phase 1: Initial Attempts (June 2025)

Started with a LangGraph-based approach using prompt engineering to make the model encode/decode messages. The model struggled with maintaining the encoding scheme consistently across multiple steps. This is no suprise because of the complexity of the coding.

Phase 2: Python Implementation (July 2025)

Switched to a direct implementation using OpenAI’s API to get token logprobs. Built the arithmetic coding algorithm from scratch:

# Core encoding logic (simplified)
def encode(context, message_bits):
    interval = [0, 2**precision]
    for bit_index in range(len(message_bits)):
        # Get next token probabilities
        logprobs = get_token_logprobs(context)
        
        # Divide interval based on probabilities
        cum_probs = compute_cumulative_probs(logprobs, interval)
        
        # Select token based on message bits
        token = select_token(message_bits, cum_probs)
        
        # Update context and interval
        context += token
        interval = update_interval(token, cum_probs)
    
    return context

Phase 3: Fighting Non-Determinism (August 2025)

Discovered that OpenAI’s API returns non-deterministic logprobs even with the same seed and parameters. Implemented caching to ensure consistency between encoding and decoding phases.

Technical Challenges Encountered

1. API Non-Determinism

Despite setting seed=42 and verifying the system fingerprint, OpenAI’s API returns different token probabilities for identical requests. This caused encoding/decoding mismatches after ~20-30 bits.

Solution attempted: Implemented a file-based cache to store and reuse API responses, ensuring encoder and decoder see identical probability distributions.

2. Interval Collapse

The most critical issue: when the LLM’s probability distributions are highly skewed (first token getting 70-90% probability), the arithmetic coding interval repeatedly straddles the binary midpoint without being able to encode bits.

Example of the problem:

Step 47: Interval [2144838895, 2164274060]
Binary representation:
  Low:  01111111110101111010010011101111
  High: 10000001000000000011001110001011
  Message: 10000000000000000000000000000000
  
No common prefix bits → Cannot encode → Interval shrinks → Failure

The algorithm got stuck selecting the highest probability token repeatedly until the interval became too narrow to represent even two tokens.

3. Tokenization Mismatches

Different tokenization between encoding and decoding phases. GPT models don’t always tokenize text the same way when it’s part of a prompt versus generated text. The solution implemented in the tool is to filter out tokens with common prefix.

4. Model Limitations

The approach only works reliably with GPT-3.5-turbo-1106. Other models (GPT-4, newer GPT-3.5 versions) had even worse non-determinism issues.

Results and Limitations

What Works

Successfully encodes/decodes messages
Generates mostly grammatically correct, semantically meaningful cover text. It can select bad continuations (see “parkI” above).
Provides a concrete demonstration of steganographic capabilities.

What Doesn’t Work

Messages longer than ~16 bits can fail due to interval collapse
API non-determinism causes sporadic failures. The caching helps only if the encoder and decoder shares it.
Limited to specific model versions (GPT-3.5)

Lessons Learned

The project became a fascinating technical challenge that taught me valuable lessons about the limitations of current LLM APIs for precise, deterministic applications.

LLM APIs aren’t designed for deterministic operations: The lack of true reproducibility in API responses makes certain applications infeasible.
Arithmetic coding requires careful probability management: Highly skewed distributions from modern LLMs break traditional arithmetic coding assumptions.
Proof-of-concepts reveal important limitations: Practical implementation revealed fundamental blockers that wouldn’t be apparent from theoretical analysis alone.
Persistence has diminishing returns: After hitting the same fundamental API limitations repeatedly, it’s important to recognize when to document findings and move on.

Future Directions

While this implementation has limitations, the concept opens several research directions:

Local model implementation: Running models locally could eliminate API non-determinism
Alternative encoding schemes: Exploring synonym substitution or other steganographic methods
Probability flattening: Applying temperature scaling after receiving logprobs to prevent interval collapse
Detection methods: Developing techniques to identify steganographically encoded text

Code and Resources

The implementation is available at: https://github.com/artkpv/arithmetic-coding-steganography/

Key features:

CLI interface for encoding/decoding
Python API for integration
Caching system for API responses
Comprehensive test suite

Conclusion

This project demonstrates both the potential and limitations of arithmetic coding steganography with current LLM APIs. While the technical challenges prevented a fully robust implementation, the work provides valuable insights into encoded reasoning capabilities and the practical constraints of building precise tools on top of probabilistic language models.

The journey from “this should work in theory” to “here’s why it doesn’t work in practice” exemplifies the importance of implementation work in AI safety research. Sometimes the most valuable outcome isn’t a working system, but a deep understanding of why certain approaches fail.

References

[1]: Ziegler, Zachary, Yuntian Deng, and Alexander Rush. ‘Neural Linguistic Steganography’. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), edited by Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan. Association for Computational Linguistics, 2019. https://doi.org/10.18653/v1/D19-1115.

If you’re interested in AI safety, steganography, or have ideas for overcoming these technical challenges, I’d love to hear from you.