Instructions to use EssentialAI/rnj-1-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EssentialAI/rnj-1-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EssentialAI/rnj-1-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EssentialAI/rnj-1-instruct")
model = AutoModelForCausalLM.from_pretrained("EssentialAI/rnj-1-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use EssentialAI/rnj-1-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EssentialAI/rnj-1-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EssentialAI/rnj-1-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EssentialAI/rnj-1-instruct

SGLang

How to use EssentialAI/rnj-1-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EssentialAI/rnj-1-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EssentialAI/rnj-1-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EssentialAI/rnj-1-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EssentialAI/rnj-1-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use EssentialAI/rnj-1-instruct with Docker Model Runner:
```
docker model run hf.co/EssentialAI/rnj-1-instruct
```

After a long time, finally a capable little coder!

by MrDevolver - opened Dec 7, 2025

Discussion

MrDevolver

Dec 7, 2025

Hello, EssentialAI!

Congratulations for your model release! I tested your model briefly in the demo space and I think it's pretty good for its size!
I can't wait to be able to use it locally in LM Studio (llama.cpp based), hopefully the support will be merged soon.

When I test this model in the demo, I can't help but wonder - what kind of things I could do with a model like this if it was slightly bigger, say around 24B?
Despite its smaller size, in some ways it is comparable to much bigger GPT-OSS 20B. If this model was about the same size, maybe it would be even better while still reasonably small. So I do think this is a good foundation for something bigger.

Are there any plans for bigger versions? Maybe MoE like GPT-OSS 20B for faster inference? I know it's probably too soon to ask, but you do have a good thing here, so I'm genuinely curious about potential future releases.

urtuuuu

Dec 8, 2025

•

edited Dec 8, 2025

What tasks did you test it on? With my usual coding test questions, it failed mostly

CHNtentes

Dec 9, 2025

how's the tool use capability? nowadays you need to support agentic coding to be useful.

Maani

Dec 9, 2025

it's a great model at its range, and honestly better to stay this size (8b is great for my potato GPU), in my internal little tests its better than grok code and qwen code, they really cooked with this one.

Narutoouz

Dec 11, 2025

See I am not understanding why this model is refusing to output code? I think it is some configuration issue in LM studio, because the bechmarks for this model is awesome at its scale!

Narutoouz

Dec 11, 2025

I used 8 bit or Q8 quant of mlx version of this model on my m4 max macbook pro.

devaansh-essential

Dec 11, 2025

Thank you for surfacing this, and I'm glad you're enjoying the model. This is related to the truncation issue, also brought up here. Rest assured, we're working to fix this!

isogen

Dec 12, 2025

•

edited Dec 12, 2025

Maybe it is because config.json sets eos_token_id to 1 (which would be "). Also generation_config.json has eos_token_id different form tokenizer.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment