CoDA-1.7B: A Bidirectional Discrete-Diffusion Code Model with Parallel Token Generation
Salesforce AI Research has published CoDA-1.7B, a discrete-diffusion language model tailored for code that denoises entire sequences using bidirectional context and updates multiple tokens in parallel. The release includes Base and Instruct checkpoints, as well as a reproducible end-to-end training, evaluation and serving stack.
Architecture and training pipeline
CoDA adapts a 1.7B-parameter transformer backbone to discrete diffusion for textual code generation. During training, sequences are masked and then iteratively denoised using full-sequence attention. This approach naturally supports infilling and non-autoregressive decoding because there is no fixed left-to-right generation order.
The model card documents a three-stage pipeline:
- Pre-training with bidirectional masking to teach the model to recover masked tokens from full-context attention.
- Supervised post-training (SFT/Instruct) to align the model for instruction-style uses and improve code generation quality.
- Progressive denoising at inference where masked sequences are gradually restored through multiple diffusion steps.
Salesforce provides reproducible scripts for TPU pre-training, GPU fine-tuning, and evaluation to enable replication.
Key features
- Bidirectional context via diffusion denoising, avoiding a fixed generation order and enabling native infilling.
- Confidence-guided sampling (entropy-style decoding) that lets users trade quality for speed by adjusting sampling temperature and algorithms.
- Open training and deployment pipeline, with CLI and deploy scripts for end-to-end workflows.
Benchmarks and performance
CoDA-1.7B-Instruct reports competitive pass@1 scores on standard code-generation benchmarks: HumanEval 54.3%, HumanEval+ 47.6%, MBPP 47.2%, MBPP+ 63.2%, and an EvalPlus aggregate of 55.4%. For perspective, some 7B diffusion baselines such as Dream-7B-Instruct report HumanEval around 57.9%, which places CoDA’s 1.7B footprint as competitive with larger diffusion models on several metrics while using far fewer parameters.
These results suggest that discrete diffusion with parallel token updates and full attention can be an efficient approach for code generation at smaller model scales.
Inference behavior and tuning
Inference cost and latency are governed by the number of diffusion steps. CoDA exposes knobs to tune the latency/quality trade-off, including STEPS, ALG=“entropy”, ALG_TEMP, and block length. Because tokens are updated in parallel under full attention, CoDA aims for lower wall-clock latency at small scale relative to larger diffusion models when using comparable step budgets.
The confidence-guided sampling options allow practitioners to prioritize faster decoding at the cost of some quality or to favor higher-quality outputs with more steps or lower temperature.
Deployment, licensing and tooling
The repository includes a FastAPI server with OpenAI-compatible APIs and an interactive CLI for local inference; instructions cover environment setup and a start_server.sh launcher. Model cards and a Hugging Face collection centralize artifacts. Checkpoints are published on Hugging Face under CC BY-NC 4.0.
You can find the model and artifacts on Hugging Face at https://huggingface.co/Salesforce/CoDA-v0-Instruct.
What this release means
CoDA-1.7B is a clear reference implementation for discrete-diffusion-based code generation at a compact scale: it demonstrates bidirectional denoising with parallel token updates, provides a reproducible pipeline from pre-training to SFT and serving, and delivers competitive benchmark results compared with some larger diffusion models. The exposed inference knobs and open deployment stack make it operationally useful for experimentation and local deployment.