jprafael
/

mpt-7b-instruct-sharded

@@ -9,19 +9,18 @@ datasets:
 tags:
 - mosaicML
 - sharded
-- story
 ---
-# mpt-7b-storywriter: sharded
-<a href="https://colab.research.google.com/gist/pszemraj/a979cdcc02edb916661c5dd97cf2294e/mpt-storywriter-sharded-inference.ipynb">
-  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
-</a>
-This is a version of the [mpt-7b-storywriter](https://huggingface.co/mosaicml/mpt-7b-storywriter)  model, sharded to 2 GB chunks for low-RAM loading (i.e. Colab).  The weights are stored in `bfloat16` so in theory you can run this on CPU, though it may take forever.
-Please refer to the previously linked repo for details on usage/implementation/etc. This model was downloaded from the original repo under Apache-2.0 and is redistributed under the same license.
 ## Basic Usage
@@ -40,12 +39,12 @@ Load the model:
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = 'ethzanalytics/mpt-7b-storywriter-sharded'
 model = AutoModelForCausalLM.from_pretrained(
     model_name,
     torch_dtype=torch.bfloat16,
     trust_remote_code=True,
-    revision='197d14245ad874da82194248cab1ce8cf87fa713', # optional, but a good idea
     device_map='auto',
     load_in_8bit=False, # install bitsandbytes then set to true for 8-bit
 )

 tags:
 - mosaicML
 - sharded
+- instruct
 ---
+# mpt-7b-instruct: sharded
+This is a version of the [mpt-7b-instruct](https://huggingface.co/mosaicml/mpt-7b-instruct) model, sharded to 2 GB chunks for low-RAM loading (i.e. Colab).
+The weights are stored in `bfloat16` so in theory you can run this on CPU, though it may take forever.
+Original code and credits go to [mpt-7b-storywriter-sharded](https://huggingface.co/ethzanalytics/mpt-7b-storywriter-sharded).
+See the [community discussion](https://huggingface.co/ethzanalytics/mpt-7b-storywriter-sharded/discussions/2) on how to replicate this.
+Please refer to the previously linked repo for details on usage/implementation/etc. This model was downloaded from the original repo under Apache-2.0 and is redistributed under the same license.
 ## Basic Usage
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = 'jprafael/mpt-7b-instruct-sharded'
 model = AutoModelForCausalLM.from_pretrained(
     model_name,
     torch_dtype=torch.bfloat16,
     trust_remote_code=True,
+    revision='8d8911ad980f48f8a791e5f5876dea891dcbc064', # optional, but a good idea
     device_map='auto',
     load_in_8bit=False, # install bitsandbytes then set to true for 8-bit
 )