Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -18,6 +18,24 @@ This repository contains a community-converted TensorRT-LLM checkpoint for [`mic
|
|
| 18 |
|
| 19 |
It is a TensorRT-LLM **checkpoint-format** repository, not a prebuilt engine. The intent is to let you download the checkpoint from Hugging Face and build an engine locally for your own GPU and TensorRT-LLM version.
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
## Model Characteristics
|
| 22 |
|
| 23 |
- Base model: `microsoft/Phi-4-mini-instruct`
|
|
|
|
| 18 |
|
| 19 |
It is a TensorRT-LLM **checkpoint-format** repository, not a prebuilt engine. The intent is to let you download the checkpoint from Hugging Face and build an engine locally for your own GPU and TensorRT-LLM version.
|
| 20 |
|
| 21 |
+
## Who This Repo Is For
|
| 22 |
+
|
| 23 |
+
This repository is for users who already work with TensorRT-LLM and want a ready-made **TensorRT-LLM checkpoint** that they can turn into a local engine for their own GPU.
|
| 24 |
+
|
| 25 |
+
It is **not**:
|
| 26 |
+
- a prebuilt TensorRT engine
|
| 27 |
+
- a plain Transformers checkpoint
|
| 28 |
+
- an Ollama model
|
| 29 |
+
- a one-click chat model that can be run directly after download
|
| 30 |
+
|
| 31 |
+
## How to Use
|
| 32 |
+
|
| 33 |
+
1. Download this repository from Hugging Face.
|
| 34 |
+
2. Build a local engine with `trtllm-build` for your own GPU and TensorRT-LLM version.
|
| 35 |
+
3. Run inference with the engine you built.
|
| 36 |
+
|
| 37 |
+
The `Build Example` section below shows the validated local command used for the benchmark snapshot in this README.
|
| 38 |
+
|
| 39 |
## Model Characteristics
|
| 40 |
|
| 41 |
- Base model: `microsoft/Phi-4-mini-instruct`
|