GitHub

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation

📄 Paper • 🏠 Repo • 🤖 Models • 📚 Datasets

Introduction

ReflectionCoder is a novel approach that effectively leverages reflection sequences constructed by integrating compiler feedback to improve one-off code generation performance. Please refer to our paper for more details!

Models

Model	Checkpoint	Size	HumanEval (+)	MBPP (+)	License
ReflectionCoder-CL-7B	🤗 HF Link	7B	75.0 (68.9)	72.2 (61.4)	Llama2
ReflectionCoder-CL-34B	🤗 HF Link	34B	70.7 (66.5)	68.4 (56.6)	Llama2
ReflectionCoder-DS-6.7B	🤗 HF Link	6.7B	80.5 (74.4)	81.5 (69.6)	DeepSeek
ReflectionCoder-DS-33B	🤗 HF Link	33B	82.9 (76.8)	84.1 (72.0)	DeepSeek

Datasets

Dataset	Link	License
ReflectionSeq-GPT	🤗 HF Link	License
ReflectionSeq-DS	🤗 HF Link	License

Training and Evalutation

Download Data

python data/download.py

Excepted File Tree

data
  - train
    - code_instruct.jsonl
    - reflection_ds.jsonl
    - reflection_gpt.jsonl

Train Script

You can use the following command to fine-tune your model with our method. Here, we assume you have 16 GPUs and set gradient_accumulation_steps as 32. You can adjust gradient_accumulation_steps based on your number of GPUs to ensure train_batch_size is equal to 512.

RANK=...
MASTER_ADDR=...
MASTER_PORT=...
WORLD_SIZE=...

model_cfg=meta-llama/CodeLlama-7b-Python-hf
out_dir=runs/relfection_coder_code_llama_7b

torchrun --node_rank ${RANK} --master_addr ${MASTER_ADDR} --master_port ${MASTER_PORT} --nnodes ${WORLD_SIZE} --nproc_per_node 8 train.py --deepspeed config/stage_1.json --learning_rate 5e-5 --lr_scheduler_type cosine --per_device_train_batch_size 1 --max_len 4096 --save_steps 100 --warmup_ratio 0.05 --logging_steps 10 --seed 3407 --num_train_epochs 2 --report_to tensorboard --remove_unused_columns false --bf16 --do_train --save_safetensors --save_only_model --gradient_checkpointing --train_file data/train/code_instruct.jsonl data/train/reflection_gpt.jsonl data/train/reflection_ds.jsonl data/train/reflection_gpt.jsonl data/train/reflection_ds.jsonl --logit --block_mask --block_order tce --model_cfg ${model_cfg} --output_dir ${out_dir} --gradient_accumulation_steps 32

Test Script

out_dir=runs/relfection_coder_code_llama_7b/checkpoint-final

python test.py -tp 2 -p ${out_dir} -t humaneval mbpp multiple

Then, use EvalPlus to evaluate the inference results. Note that you should install the nightly version of EvalPlus with pip install "git+https://github.com/evalplus/evalplus.git" --upgrade.

out_dir=runs/relfection_coder_code_llama_7b/checkpoint-final

evalplus.evaluate --dataset humaneval --samples ${out_dir}/results/humaneval.jsonl
evalplus.evaluate --dataset mbpp --samples ${out_dir}/results/mbpp.jsonl

We also provide generated results for HuamnEval and MBPP in data.

For multiple, you can use bigcode-evaluation-harness to evaluate the inference results. For example, you can evaluate java with the following command:

out_dir=.../runs/relfection_coder_code_llama_7b/checkpoint-final

cd .../bigcode-evaluation-harness

python3 main.py \
  --model relfection_coder_code_llama_7b \
  --tasks multiple-java \
  --allow_code_execution \
  --load_generations_path ${out_dir}/results/multiple_java.json \
  --metric_output_path ${out_dir}/results/multiple_java_result.json

When testing MultiPL-E, there are two things to pay attention to:

For JAVA, there are a wrong parameter in the testing code, you need replace result = run(["java", "-ea", "-cp", f"{outdir}", "Problem"], env=sys_env) to result = run(["java", "-ea", "-cp", f"{outdir}:{javatuples_path}", "Problem"], env=sys_env) in bigcode_eval/tasks/custom_metrics/multiple_metrics/eval_java.py. Note that, for fair comparsion, we only fix the bug when evaluating DeepSeek-Coder, and use the original code when evaluating Code LLama.
For C-Sharp, the testing code in nuprl/MultiPL-E have some bugs, please use https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/Evaluation/HumanEval/data/humaneval-cs-bu.jsonl.

Citation

If you find this repo useful for your research, please kindly cite our paper:

@misc{ren2024reflectioncoder,
    title={ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation}, 
    author={Houxing Ren and Mingjie Zhan and Zhongyuan Wu and Aojun Zhou and Junting Pan and Hongsheng Li},
    year={2024},
    eprint={2405.17057},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Acknowledgments

We thank the following amazing projects that truly inspired us:

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
figures		figures
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation

Introduction

Models

Datasets

Training and Evalutation

Download Data

Excepted File Tree

Train Script

Test Script

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

SenseLLM/ReflectionCoder

Folders and files

Latest commit

History

Repository files navigation

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation

Introduction

Models

Datasets

Training and Evalutation

Download Data

Excepted File Tree

Train Script

Test Script

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages