llm-jp-4-8b-thinking-gguf モデルカード

GGUF形式にしても最低限チャットテンプレートを維持したまま、推論できるようにしたモデルです。オリジナルはllm-jp/llm-jp-4-8b-thinkingです。

使い方

以下の２つの使い方があります。

llama.cpp (llama-completionのみ)
LM Studio

llama.cpp (llama-completionのみ)

以下のコマンドを実行してください。

./build/bin/llama-completion \
--model /path/to/llm-jp-4-8B-thinking-Q4_K_M.gguf \
-p "<|start|>system<|message|>You are LLM-jp-4, a large language model trained by LLM-jp.\nKnowledge cutoff: 2025-12\nCurrent date: 2026-04-03\n\nReasoning: medium\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>自然言語処理とは何か<|end|><|start|>assistant"

なお、llama-cliとllama-serverはランタイムの仕様により正常に動きません。ご了承ください。

LM Studio

普通にダウンロードして、カスタム指示を入れずに実行してください。「こんにちは」と入れると、次のように出力されます。

<|channel|> analysis<|message|> The user says "こんにちは" (Hello). We should respond appropriately in Japanese. Probably greet back and ask how can help. Use polite tone.<|end|><|start|> assistant<|channel|> final<|message|> こんにちは！今日はどのようなお手伝いをしましょうか？お気軽にお知らせください。

Thinkingの部分が邪魔な場合はReasoning Parsingを以下のように設定してください。

Start String: <|channel|>
End String: final<|message|>

すると、以下のように表示されるはずです。

Thought for 1.50 seconds
こんにちは！今日はどのようなお手伝いをさせていただけますか？

GGUFの作り方

TBA

ヒント:

./llama-quantize \
 --override-kv tokenizer.ggml.add_space_prefix=bool:false \
  model-f16.gguf \
  model-f16-no-prefix.gguf \
  copy

とここ。

Downloads last month: 2,775

GGUF

Model size

9B params

Architecture

llama

Hardware compatibility

4-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alfredplpl/llm-jp-4-8b-thinking-gguf

Base model

llm-jp/llm-jp-4-8b-thinking

Quantized

(2)

this model

alfredplpl
/

llm-jp-4-8b-thinking-gguf

llm-jp-4-8b-thinking-gguf モデルカード

使い方

llama.cpp (llama-completionのみ)

LM Studio

GGUFの作り方

Model tree for alfredplpl/llm-jp-4-8b-thinking-gguf

Dataset used to train alfredplpl/llm-jp-4-8b-thinking-gguf

Space using alfredplpl/llm-jp-4-8b-thinking-gguf 1