Instructions to use deepseek-ai/DeepSeek-R1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deepseek-ai/DeepSeek-R1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use deepseek-ai/DeepSeek-R1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deepseek-ai/DeepSeek-R1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/deepseek-ai/DeepSeek-R1

SGLang

How to use deepseek-ai/DeepSeek-R1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "deepseek-ai/DeepSeek-R1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "deepseek-ai/DeepSeek-R1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use deepseek-ai/DeepSeek-R1 with Docker Model Runner:
```
docker model run hf.co/deepseek-ai/DeepSeek-R1
```

使用不带 thinking 的数据集微调时无法正常生成

#46

by HuanLin - opened Jan 26, 2025

Discussion

HuanLin

Jan 26, 2025

•

edited Jan 26, 2025

我和@yanyongyu 均出现了这个问题

我使用的是 Alpaca 格式的数据集，@yanyongyu 是按照 chat template 来的

我的复现流程

使用笔记本
笔记本第二个代码块 model_name 改成 unsloth/DeepSeek-R1-Distill-Qwen-7B-unsloth-bnb-4bit
数据集使用 ssbuild/alpaca_medical
直接用默认参数训练，然后进行推理

现象

集内胡言乱语
集外 ("你是谁") 停不下来

(疑似是干到 max_token 了，后面的 eos 貌似是 tokenizer decode 的时候加的)

@yanyongyu 的复现流程等会他来补充

yanyongyu

Jan 26, 2025

•

edited Jan 26, 2025

我使用的闭源数据集，因此数据相关部分就省略代替。微调模型使用的是 distill qwen 7b。短暂训练后出现模型无限输出（重复一小段话，然后出现乱码，多为问号和句号），不出 eos 的情况。

训练采用 trl.SFTTrainer，数据集预处理后采用 trl 可接受的 chat template 输入格式，即 {"prompt": "xxx", "completion": "xxx"}，trl 会使用 tokenizer.apply_chat_template 预处理。数据不含 thinking 信息，仅有 prompt 和 answer。检查了训练集在 apply chat template 之后包含 eos token，tokenize 之后也存在 151643。

样例代码：

MODEL_NAME = "./data/DeepSeek-R1-Distill-Qwen-7B"

args = SFTConfig(
    output_dir=OUTPUT_DIR,
    do_train=True,
    logging_first_step=True,
    logging_dir=LOG_DIR,
    logging_steps=100,
    save_strategy=IntervalStrategy.EPOCH,
    save_steps=1,
    num_train_epochs=TRAIN_EPOCHS,
    optim=OptimizerNames.ADAMW_TORCH,
    per_device_train_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
    learning_rate=LEARNING_RATE,
    weight_decay=WEIGHT_DECAY,
    warmup_ratio=WARMUP_RATIO,
    max_seq_length=1024,
)

with args.main_process_first(local=False, desc="loading tokenizer"):
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

with args.main_process_first(local=False, desc="loading model"):
    model = Qwen2ForCausalLM.from_pretrained(MODEL_NAME, torch_dtype="auto", device_map="auto")

def preprocess_dataset(batch: dict[str, list[str]]):
    return {"prompt": batch["input"], "completion": batch["target"]}

with args.main_process_first(local=False, desc="loading dataset"):
    train_dataset = Dataset()  # load dataset here
    # preprocess
    train_dataset = train_dataset.map(
        preprocess_dataset, batched=True, remove_columns=train_dataset.column_names
    )

trainer = SFTTrainer(
    model,
    args=args,
    processing_class=tokenizer,
    train_dataset=train_dataset,
)

if __name__ == "__main__":
    print("Start training...")  # noqa: T201
    trainer.train()

HuanLin changed discussion title from 使用不带 thinking 的数据集微调时出现停不下来的问题 to 使用不带 thinking 的数据集微调时无法正常生成 Jan 26, 2025

zzzbbypolyu

Mar 20, 2025

您好，请问您解决这个问题了吗？

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment