Instructions to use EssentialAI/rnj-1-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EssentialAI/rnj-1-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="EssentialAI/rnj-1-instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("EssentialAI/rnj-1-instruct") model = AutoModelForCausalLM.from_pretrained("EssentialAI/rnj-1-instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use EssentialAI/rnj-1-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EssentialAI/rnj-1-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EssentialAI/rnj-1-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/EssentialAI/rnj-1-instruct
- SGLang
How to use EssentialAI/rnj-1-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "EssentialAI/rnj-1-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EssentialAI/rnj-1-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "EssentialAI/rnj-1-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EssentialAI/rnj-1-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use EssentialAI/rnj-1-instruct with Docker Model Runner:
docker model run hf.co/EssentialAI/rnj-1-instruct
After a long time, finally a capable little coder!
Hello, EssentialAI!
Congratulations for your model release! I tested your model briefly in the demo space and I think it's pretty good for its size!
I can't wait to be able to use it locally in LM Studio (llama.cpp based), hopefully the support will be merged soon.
When I test this model in the demo, I can't help but wonder - what kind of things I could do with a model like this if it was slightly bigger, say around 24B?
Despite its smaller size, in some ways it is comparable to much bigger GPT-OSS 20B. If this model was about the same size, maybe it would be even better while still reasonably small. So I do think this is a good foundation for something bigger.
Are there any plans for bigger versions? Maybe MoE like GPT-OSS 20B for faster inference? I know it's probably too soon to ask, but you do have a good thing here, so I'm genuinely curious about potential future releases.
What tasks did you test it on? With my usual coding test questions, it failed mostly
how's the tool use capability? nowadays you need to support agentic coding to be useful.
it's a great model at its range, and honestly better to stay this size (8b is great for my potato GPU), in my internal little tests its better than grok code and qwen code, they really cooked with this one.
I used 8 bit or Q8 quant of mlx version of this model on my m4 max macbook pro.
Thank you for surfacing this, and I'm glad you're enjoying the model. This is related to the truncation issue, also brought up here. Rest assured, we're working to fix this!
Maybe it is because config.json sets eos_token_id to 1 (which would be "). Also generation_config.json has eos_token_id different form tokenizer.
