MCMC Substitution Cipher Solver

C++17 high-performance rewrite with bigram count scoring; ~95× faster with multithreaded chains + simulated annealing.

About This Project

MCMC Substitution Cipher Solver: C++ High-Performance Edition

This project is a C++ implementation of a substitution cipher solver using a Metropolis–Hastings MCMC algorithm enhanced with Simulated Annealing. It models deciphering as a search over permutation keys, using a training corpus (e.g., War and Peace, Shakespeare) to estimate unigram and bigram statistics that define a log-likelihood for candidate decodings.

From Python to C++: Why it’s Faster

  • Raw speed: compiled C++ with low-level control and -O3 optimizations.
  • True parallelism: runs multiple independent MCMC chains concurrently (std::thread).
  • Fine-grained optimization: careful data structures and precomputation.

Core Improvements

  • Optimized log-likelihood via precomputed bigram transition counts. The secret message is scanned once to build a counts matrix, reducing each MH step from O(n) over text length to O(1) using the counts.
  • Simulated Annealing: a temperature parameter starts high and cools down, improving exploration early and convergence later.
  • Precomputed logarithms of frequency/transition matrices to avoid repeated std::log calls in the hot loop.

Parallelism

  • Multiple MCMC chains launched via std::thread, with starting points seeded differently.
  • std::mutex protects updates to the global best solution across threads.
  • Chain count auto-scales to std::thread::hardware_concurrency().

Tooling & Usability

  • CMake build with -O3; consistent style via clang-format.
  • Two executables:
    • scramble_text: encodes plaintext with a random key and saves the key.
    • run_deciphering: decodes with flags like -i (training), -d (secret), -iters, -print_every, -o (output file).

Performance

  • ~95× faster than the Python baseline when accounting for total iterations (C++ with 8 threads in tests).
  • Higher quality results from precomputed counts + annealing + throughput.

Notes

  • The C++ version handles some inputs that the Python baseline rejected due to unsupported characters.

Technologies Used

C++17CMake (-O3)std::threadstd::mutexMetropolis–HastingsSimulated AnnealingBigram countsCLIclang-format