Instructions to use bigcode/santacoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bigcode/santacoder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bigcode/santacoder", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bigcode/santacoder", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("bigcode/santacoder", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bigcode/santacoder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bigcode/santacoder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/santacoder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bigcode/santacoder
- SGLang
How to use bigcode/santacoder with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bigcode/santacoder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/santacoder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bigcode/santacoder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/santacoder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bigcode/santacoder with Docker Model Runner:
docker model run hf.co/bigcode/santacoder
Require Pytorch version
I have tried to run the model with Pytorch=1.9.0+cu111, but its generated text is bizarre with duplicated words. So I want to know about the requirement of torch version and other libraries. Thank you.
Can you please share the code you used to generate text, Pytorch version shouldn't impact the generation. Something to pay attention to is not passing token_type_ids returned by the tokenizer to the model. Here's a working example to use the model both in standard and FIM settings:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("bigcode/santacoder", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("bigcode/santacoder")
#standard example
input_text ="def all_odd_elements((L):\n"
# example to do FIM, add fim special tokens: <fim-prefix>, <fim-middle> and <fim-suffix>
input_text_fim = "<fim-prefix>def fib(n):<fim-suffix> else:\n return fib(n - 2) + fib(n - 1)<fim-middle>"
# tokenizer(inputs) returns inputs_ids, attention_mask and token_types_ids, the latter shouldn't be fed to the model
# so if you want to use model(**inputs) or model.generate(**inputs) make sure you add return_token_type_ids=False to not have it returned
inputs = tokenizer(input_text, return_tensors="pt") # add return_token_type_ids=False for model(**inputs)
inputs_fim = tokenizer(input_text_fim, return_tensors="pt") # add return_token_type_ids=False for model(**inputs)
outputs = model.generate(inputs["input_ids"], max_new_tokens=18)
outputs_fim = model.generate(inputs_fim["input_ids"], max_new_tokens=25)
generation = [tokenizer.decode(tensor, skip_special_tokens=False) for tensor in outputs]
generation_fim = [tokenizer.decode(tensor, skip_special_tokens=False) for tensor in outputs_fim]
print(f"Standard example:\n {generation[0]}")
print(f"FIM example:\n {generation_fim[0]}")
Standard example:
def all_odd_elements((L):
return all(x % 2!= 0 for x in L)
FIM example:
<fim-prefix>def fib(n):<fim-suffix> else:
return fib(n - 2) + fib(n - 1)<fim-middle>
if n == 0:
return 0
elif n == 1:
return 1
<|endoftext|><fim-prefix>
Yeah there is a mistake in my text generation code ^^ I have changed the code and it is working well now
Previously:
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=64)
I checked the README.md again and have changed it into
inputs = tokenizer.encode(input_text, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=64)
I have just tried to add return_token_type_ids=False in the first case, too, and it also works. Thank you ^^
Great, you don't even need to specify return_token_type_ids=False now, we turned it off by default