Instructions to use codefuse-ai/CodeFuse-StarCoder2-15B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use codefuse-ai/CodeFuse-StarCoder2-15B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="codefuse-ai/CodeFuse-StarCoder2-15B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("codefuse-ai/CodeFuse-StarCoder2-15B") model = AutoModelForCausalLM.from_pretrained("codefuse-ai/CodeFuse-StarCoder2-15B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use codefuse-ai/CodeFuse-StarCoder2-15B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "codefuse-ai/CodeFuse-StarCoder2-15B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codefuse-ai/CodeFuse-StarCoder2-15B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/codefuse-ai/CodeFuse-StarCoder2-15B
- SGLang
How to use codefuse-ai/CodeFuse-StarCoder2-15B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "codefuse-ai/CodeFuse-StarCoder2-15B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codefuse-ai/CodeFuse-StarCoder2-15B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "codefuse-ai/CodeFuse-StarCoder2-15B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codefuse-ai/CodeFuse-StarCoder2-15B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use codefuse-ai/CodeFuse-StarCoder2-15B with Docker Model Runner:
docker model run hf.co/codefuse-ai/CodeFuse-StarCoder2-15B
| frameworks: | |
| - Pytorch | |
| tasks: | |
| - text-generation | |
| pipeline_tag: text-generation | |
| # Model Card for CodeFuse-StarCoder2-15B | |
| <p align="center"> | |
| <img src="https://modelscope.cn/api/v1/models/codefuse-ai/CodeFuse-StarCoder2-15B/repo?Revision=master&FilePath=LOGO.jpg&View=true" width="800"/> | |
| <p> | |
| [[中文]](#chinese) [[English]](#english) | |
| #### Clone with HTTP | |
| ```bash | |
| git clone https://huggingface.co/codefuse-ai/CodeFuse-StarCoder2-15B.git | |
| ``` | |
| <a id="english"></a> | |
| ## Model Description | |
| CodeFuse-StarCoder2-15B is a 15B Code-LLM finetuned by LoRA on multiple code-related tasks on the base model Starcoder2-15b. | |
| <br> | |
| ## News and Updates | |
| 🔥🔥🔥 2024-05-20 CodeFuse-StarCoder2-15B has been released, achieving a pass@1 (greedy decoding) score of 73.17% on HumanEval. | |
| 🔥🔥 2024-01-12 CodeFuse-DeepSeek-33B has been released, achieving a pass@1 (greedy decoding) score of 78.65% on HumanEval. | |
| 🔥🔥 2024-01-12 CodeFuse-Mixtral-8x7B has been released, achieving a pass@1 (greedy decoding) score of 56.1% on HumanEval, which is a 15% increase compared to Mixtral-8x7b's 40%. | |
| 🔥🔥 2023-11-10 CodeFuse-CodeGeeX2-6B has been released, achieving a pass@1 (greedy decoding) score of 45.12% on HumanEval, which is a 9.22% increase compared to CodeGeeX2 35.9%. | |
| 🔥🔥 2023-10-20 CodeFuse-QWen-14B technical documentation has been released. For those interested, please refer to the CodeFuse article on our WeChat official account via the provided link.(https://mp.weixin.qq.com/s/PCQPkvbvfxSPzsqjOILCDw) | |
| 🔥🔥 2023-10-16 CodeFuse-QWen-14B has been released, achieving a pass@1 (greedy decoding) score of 48.78% on HumanEval, which is a 16% increase compared to Qwen-14b's 32.3%. | |
| 🔥🔥 2023-09-27 CodeFuse-StarCoder-15B has been released, achieving a pass@1 (greedy decoding) score of 54.9% on HumanEval, which is a 21% increase compared to StarCoder's 33.6%. | |
| 🔥🔥 2023-09-26 We are pleased to announce the release of the 4-bit quantized version of CodeFuse-CodeLlama-34B. Despite the quantization process, the model still achieves a remarkable 73.8% accuracy (greedy decoding) on the HumanEval pass@1 metric. | |
| 🔥🔥 2023-09-11 CodeFuse-CodeLlama-34B has achieved 74.4% of pass@1 (greedy decoding) on HumanEval, which is SOTA results for openspurced LLMs at present. | |
| <br> | |
| ## Code Community | |
| **Homepage**: 🏡 https://github.com/codefuse-ai (**Please give us your support with a Star🌟 + Fork🚀 + Watch👀**) | |
| + If you wish to fine-tune the model yourself, you can visit ✨[MFTCoder](https://github.com/codefuse-ai/MFTCoder)✨✨ | |
| + If you wish to see a demo of the model, you can visit ✨[CodeFuse Demo](https://github.com/codefuse-ai/codefuse)✨✨ | |
| <br> | |
| ## Performance | |
| ### HumanEval | |
| | Model | HumanEval(pass@1) | Date | | |
| | :------------------------------- | :---------------: | :-----: | | |
| | **CodeFuse-StarCoder2-15B** | **73.17%** | 2024.05 | | |
| | **CodeFuse-DeepSeek-33B** | **78.65%** | 2024.01 | | |
| | **CodeFuse-Mixtral-8x7B** | 56.10% | 2024.01 | | |
| | **CodeFuse-CodeLlama-34B** | 74.4% | 2023.9 | | |
| | **CodeFuse-CodeLlama-34B-4bits** | 73.8% | 2023.9 | | |
| | **CodeFuse-StarCoder-15B** | 54.9% | 2023.9 | | |
| | **CodeFuse-QWen-14B** | 48.78% | 2023.10 | | |
| | **CodeFuse-CodeGeeX2-6B** | 45.12% | 2023.11 | | |
| | WizardCoder-Python-34B-V1.0 | 73.2% | 2023.8 | | |
| | GPT-4(zero-shot) | 67.0% | 2023.3 | | |
| | PanGu-Coder2 15B | 61.6% | 2023.8 | | |
| | CodeLlama-34b-Python | 53.7% | 2023.8 | | |
| | CodeLlama-34b | 48.8% | 2023.8 | | |
| | GPT-3.5(zero-shot) | 48.1% | 2022.11 | | |
| | OctoCoder | 46.2% | 2023.8 | | |
| | StarCoder-15B | 33.6% | 2023.5 | | |
| | Qwen-14b | 32.3% | 2023.10 | | |
| ### HumanEval-X and MBPP(500) | |
| | Model | python | js |java |cpp |go |MBPP-500| | |
| | :------------------------------- | :---------------: | :-----: | :-----: | :-----: | :-----: |:-----: | | |
| | **CodeFuse-StarCoder2-15B** | 73.17% | 67.68% |69.51% |60.98% |56.71% |62.80% | | |
| <br> | |
| ## Requirements | |
| * python>=3.8 | |
| * pytorch>=2.1.0 | |
| * transformers>=4.40.0 | |
| * Sentencepiece | |
| * CUDA >=11.4 | |
| <br> | |
| ## Inference String Format | |
| The inference string is a concatenated string formed by combining conversation data(system, human and bot contents) in the training data format. It is used as input during the inference process. | |
| Here are examples of prompts used to request the model: | |
| **Multi-Round with System Prompt:** | |
| ```python | |
| """ | |
| <s>system | |
| System instruction | |
| <s>human | |
| Human 1st round input | |
| <s>bot | |
| Bot 1st round output<|end▁of▁sentence|> | |
| <s>human | |
| Human 2nd round input | |
| <s>bot | |
| Bot 2nd round output<|end▁of▁sentence|> | |
| ... | |
| ... | |
| ... | |
| <s>human | |
| Human nth round input | |
| <s>bot | |
| """ | |
| ``` | |
| **Single-Round without System Prompt:** | |
| ```python | |
| """ | |
| <s>human | |
| User prompt... | |
| <s>bot | |
| """ | |
| ``` | |
| In this format, the system section is optional and the conversation can be either single-turn or multi-turn. When applying inference, you always make your input string end with "\<s\>bot" to ask the model generating answers. | |
| For example, the format used to infer HumanEval is like the following: | |
| ``` | |
| <s>human | |
| # language: Python | |
| from typing import List | |
| def separate_paren_groups(paren_string: str) -> List[str]: | |
| """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to | |
| separate those group into separate strings and return the list of those. | |
| Separate groups are balanced (each open brace is properly closed) and not nested within each other | |
| Ignore any spaces in the input string. | |
| >>> separate_paren_groups('( ) (( )) (( )( ))') | |
| ['()', '(())', '(()())'] | |
| """ | |
| <s>bot | |
| ``` | |
| Specifically, we also add the Programming Language Tag (e.g. "```# language: Python```" for Python) used by CodeGeex models. | |
| ## Quickstart | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig | |
| model_dir = "codefuse-ai/CodeFuse-StarCoder2-15B" | |
| def load_model_tokenizer(model_path): | |
| tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) | |
| tokenizer.eos_token = "<|endoftext|>" | |
| tokenizer.pad_token = "<|endoftext|>" | |
| tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids(tokenizer.eos_token) | |
| tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token) | |
| tokenizer.padding_side = "left" | |
| model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto',torch_dtype=torch.bfloat16, trust_remote_code=True) | |
| return model, tokenizer | |
| HUMAN_ROLE_START_TAG = "<s>human\n" | |
| BOT_ROLE_START_TAG = "<s>bot\n" | |
| text_list = [f'{HUMAN_ROLE_START_TAG}Write a QuickSort program\n#Python\n{BOT_ROLE_START_TAG}'] | |
| model, tokenizer = load_model_tokenizer(model_dir) | |
| inputs = tokenizer(text_list, return_tensors='pt', padding=True, add_special_tokens=False).to('cuda') | |
| input_ids = inputs["input_ids"] | |
| attention_mask = inputs["attention_mask"] | |
| generation_config = GenerationConfig( | |
| eos_token_id=tokenizer.eos_token_id, | |
| pad_token_id=tokenizer.pad_token_id, | |
| temperature=0.1, | |
| max_new_tokens=512, | |
| num_return_sequences=1, | |
| num_beams=1, | |
| top_p=0.95, | |
| do_sample=False | |
| ) | |
| outputs = model.generate( | |
| inputs= input_ids, | |
| attention_mask=attention_mask, | |
| **generation_config.to_dict() | |
| ) | |
| gen_text = tokenizer.batch_decode(outputs[:, input_ids.shape[1]:], skip_special_tokens=True) | |
| print(gen_text[0]) | |
| ``` | |
| <a id="chinese"></a> | |
| ## 模型简介 | |
| CodeFuse-StarCoder2-15B 是一个通过LoRA对基座模型Starcoder2-15b行多代码任务微调而得到的代码大模型。 | |
| <br> | |
| ## 新闻 | |
| 🔥🔥🔥 2024-05-20 CodeFuse-StarCoder2-15B模型发布,模型在HumanEval pass@1指标为73.17% (贪婪解码)。 | |
| 🔥🔥 2024-01-12 CodeFuse-DeepSeek-33B模型发布,模型在HumanEval pass@1指标为78.65% (贪婪解码)。 | |
| 🔥🔥 2023-11-10 开源了CodeFuse-CodeGeeX2-6B模型,在HumanEval pass@1(greedy decoding)上可以达到48.12%, 比CodeGeeX2提高了9.22%的代码能力(HumanEval) | |
| 🔥🔥 2023-10-20 公布了CodeFuse-QWen-14B技术文档,感兴趣详见微信公众号CodeFuse文章:https://mp.weixin.qq.com/s/PCQPkvbvfxSPzsqjOILCDw | |
| 🔥🔥 2023-10-16开源了CodeFuse-QWen-14B模型,在HumanEval pass@1(greedy decoding)上可以达到48.78%, 比Qwen-14b提高了16%的代码能力(HumanEval) | |
| 🔥🔥 2023-09-27开源了CodeFuse-StarCoder-15B模型,在HumanEval pass@1(greedy decoding)上可以达到54.9%, 比StarCoder提高了21%的代码能力(HumanEval) | |
| 🔥🔥 2023-09-26 [CodeFuse-CodeLlama-34B 4bits](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits/summary)量化版本发布,量化后模型在HumanEval pass@1指标为73.8% (贪婪解码)。 | |
| 🔥🔥 2023-09-11 [CodeFuse-CodeLlama-34B](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B/summary)发布,HumanEval pass@1指标达到74.4% (贪婪解码), 为当前开源SOTA。 | |
| <br> | |
| ## 代码社区 | |
| **大本营**: 🏡 https://github.com/codefuse-ai (**请支持我们的项目Star🌟 + Fork🚀 + Watch👀**) | |
| + 如果您想自己微调该模型,可以访问 ✨[MFTCoder](https://github.com/codefuse-ai/MFTCoder)✨✨ | |
| + 如果您想观看该模型示例,可以访问 ✨[CodeFuse Demo](https://github.com/codefuse-ai/codefuse)✨✨ | |
| <br> | |
| ## 评测表现 | |
| ### 代码 | |
| | 模型 | HumanEval(pass@1) | 日期 | | |
| | :------------------------------- | :---------------: | :-----: | | |
| | **CodeFuse-StarCoder2-15B** | **73.17%** | 2024.05 | | |
| | **CodeFuse-DeepSeek-33B** | **78.65%** | 2024.01 | | |
| | **CodeFuse-Mixtral-8x7B** | 56.10% | 2024.01 | | |
| | **CodeFuse-CodeLlama-34B** | 74.4% | 2023.9 | | |
| | **CodeFuse-CodeLlama-34B-4bits** | 73.8% | 2023.9 | | |
| | **CodeFuse-StarCoder-15B** | 54.9% | 2023.9 | | |
| | **CodeFuse-QWen-14B** | 48.78% | 2023.10 | | |
| | **CodeFuse-CodeGeeX2-6B** | 45.12% | 2023.11 | | |
| | WizardCoder-Python-34B-V1.0 | 73.2% | 2023.8 | | |
| | GPT-4(zero-shot) | 67.0% | 2023.3 | | |
| | PanGu-Coder2 15B | 61.6% | 2023.8 | | |
| | CodeLlama-34b-Python | 53.7% | 2023.8 | | |
| | CodeLlama-34b | 48.8% | 2023.8 | | |
| | GPT-3.5(zero-shot) | 48.1% | 2022.11 | | |
| | OctoCoder | 46.2% | 2023.8 | | |
| | StarCoder-15B | 33.6% | 2023.5 | | |
| | Qwen-14b | 32.3% | 2023.10 | | |
| ### HumanEval-X and MBPP(500) | |
| | 模型 | python | js |java |cpp |go |MBPP-500 | | |
| | :------------------------------- | :---------------: | :-----: | :-----: | :-----: | :-----: |:-----: | | |
| | CodeFuse-StarCoder2-15B | 73.17% | 67.68% |69.51% |60.98% |56.71% |62.80% | | |
| ## Requirements | |
| * python>=3.8 | |
| * pytorch>=2.1.0 | |
| * transformers>=4.40.0 | |
| * Sentencepiece | |
| * CUDA 11.4 | |
| <br> | |
| ## 推理数据格式 | |
| 推理数据为模型在训练数据格式下拼接的字符串形式,它也是推理时输入prompt拼接的方式. 下面分别是带系统提示的多轮会话格式和不带系统提示的单轮会话格式: | |
| **带System提示的多轮会话格式:** | |
| ```python | |
| """ | |
| <s>system | |
| System instruction | |
| <s>human | |
| Human 1st round input | |
| <s>bot | |
| Bot 1st round output<|end▁of▁sentence|> | |
| <s>human | |
| Human 2nd round input | |
| <s>bot | |
| Bot 2nd round output<|end▁of▁sentence|> | |
| ... | |
| ... | |
| ... | |
| <s>human | |
| Human nth round input | |
| <s>bot | |
| """ | |
| ``` | |
| **不带System提示的单轮会话格式:** | |
| ```python | |
| """ | |
| <s>human | |
| User prompt... | |
| <s>bot | |
| """ | |
| ``` | |
| 在这个格式中,System提示是可选的(按需设定),支持单轮会话也支持多轮会话。推理时,请确保拼接的prompt字符串以"\<s\>bot\n"结尾,引导模型生成回答。 | |
| 例如,推理HumanEval数据时使用的格式如下所示: | |
| ```python | |
| <s>human | |
| # language: Python | |
| from typing import List | |
| def separate_paren_groups(paren_string: str) -> List[str]: | |
| """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to | |
| separate those group into separate strings and return the list of those. | |
| Separate groups are balanced (each open brace is properly closed) and not nested within each other | |
| Ignore any spaces in the input string. | |
| >>> separate_paren_groups('( ) (( )) (( )( ))') | |
| ['()', '(())', '(()())'] | |
| """ | |
| <s>bot | |
| ``` | |
| 特别地,我们也使用了CodeGeeX系列模型采用的编程语言区分标签(例如,对于Python语言,我们会使用"```# language: Python```")。 | |
| ## 快速使用 | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig | |
| model_dir = "codefuse-ai/CodeFuse-StarCoder2-15B" | |
| def load_model_tokenizer(model_path): | |
| tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) | |
| tokenizer.eos_token = "<|endoftext|>" | |
| tokenizer.pad_token = "<|endoftext|>" | |
| tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids(tokenizer.eos_token) | |
| tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token) | |
| tokenizer.padding_side = "left" | |
| model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto',torch_dtype=torch.bfloat16, trust_remote_code=True) | |
| return model, tokenizer | |
| HUMAN_ROLE_START_TAG = "<s>human\n" | |
| BOT_ROLE_START_TAG = "<s>bot\n" | |
| text_list = [f'{HUMAN_ROLE_START_TAG}请写一个快排程序\n#Python\n{BOT_ROLE_START_TAG}'] | |
| model, tokenizer = load_model_tokenizer(model_dir) | |
| inputs = tokenizer(text_list, return_tensors='pt', padding=True, add_special_tokens=False).to('cuda') | |
| input_ids = inputs["input_ids"] | |
| attention_mask = inputs["attention_mask"] | |
| generation_config = GenerationConfig( | |
| eos_token_id=tokenizer.eos_token_id, | |
| pad_token_id=tokenizer.pad_token_id, | |
| temperature=0.2, | |
| max_new_tokens=512, | |
| num_return_sequences=1, | |
| num_beams=1, | |
| top_p=0.95, | |
| do_sample=False | |
| ) | |
| outputs = model.generate( | |
| inputs= input_ids, | |
| attention_mask=attention_mask, | |
| **generation_config.to_dict() | |
| ) | |
| gen_text = tokenizer.batch_decode(outputs[:, input_ids.shape[1]:], skip_special_tokens=True) | |
| print(gen_text[0]) | |
| ``` |