Adding Evaluation Results

06df4cb verified about 2 years ago

7.79 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- transformers
	datasets:
	- mwitiderrick/AlpacaCode
	base_model: mwitiderrick/open_llama_3b_code_instruct_0.1
	inference: true
	model_type: llama
	prompt_template: "<s>[INST] \n{prompt}\n[/INST]\n"
	created_by: mwitiderrick
	pipeline_tag: text-generation
	model-index:
	- name: mwitiderrick/open_llama_3b_instruct_v_0.2
	results:
	- task:
	type: text-generation
	dataset:
	name: hellaswag
	type: hellaswag
	metrics:
	- type: hellaswag (0-Shot)
	value: 0.66
	name: hellaswag(0-Shot)
	- task:
	type: text-generation
	dataset:
	name: winogrande
	type: winogrande
	metrics:
	- type: winogrande (0-Shot)
	value: 0.6322
	name: winogrande(0-Shot)
	- task:
	type: text-generation
	dataset:
	name: arc_challenge
	type: arc_challenge
	metrics:
	- type: arc_challenge (0-Shot)
	value: 0.3447
	name: arc_challenge(0-Shot)
	source:
	url: https://huggingface.co/mwitiderrick/open_llama_3b_instruct_v_0.2
	name: open_llama_3b_instruct_v_0.2 model card
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 40.7
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/open_llama_3b_glaive_assistant_v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 67.45
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/open_llama_3b_glaive_assistant_v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 27.74
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/open_llama_3b_glaive_assistant_v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 35.86
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/open_llama_3b_glaive_assistant_v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 64.72
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/open_llama_3b_glaive_assistant_v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 1.97
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/open_llama_3b_glaive_assistant_v0.1
	name: Open LLM Leaderboard
	---
	# OpenLLaMA Glaive: An Open Reproduction of LLaMA

	This is an [OpenLlama model Code Instruct](https://huggingface.co/mwitiderrick/open_llama_3b_code_instruct_0.1) that has been fine-tuned on 1 epoch of the
	[Glaive Assistsnt](https://huggingface.co/datasets/mwitiderrick/glaive-code-assistant) dataset.

	## Prompt Template
	```
	<s>[INST] {{ user_msg }} [/INST]

	```
	## Usage
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM,pipeline

	tokenizer = AutoTokenizer.from_pretrained("mwitiderrick/open_llama_3b_glaive_code_v0.1")
	model = AutoModelForCausalLM.from_pretrained("mwitiderrick/open_llama_3b_glaive_v0.1")
	query = "Write a quick sort algorithm in Python"
	text_gen = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
	output = text_gen(f"<s>[INST]{query}[/INST]")
	print(output[0]['generated_text'])
	"""
	<s>[INST]Write a quick sort algorithm in Python[/INST]

	Quick sort is a divide and conquer algorithm that sorts an array in-place.
	It works by repeatedly dividing the array into two sub-arrays, sorting
	them, and then merging them back together.

	Here's a Python implementation of the quick sort algorithm:

	def quick_sort(arr):
	if len(arr) <= 1:
	return arr
	else:
	pivot = arr[len(arr) // 2]
	left = [x for x in arr if x < pivot]
	right = [x for x in arr if x > pivot]
	return quick_sort(left) + [pivot] + quick_sort
	"""
	```
	## Metrics
	[Detailed metrics](https://huggingface.co/datasets/open-llm-leaderboard/details_mwitiderrick__open_llama_3b_glaive_assistant_v0.1)
	```
	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|---------\|-------\|------\|-----:\|--------\|-----:\|---\|-----:\|
	\|hellaswag\|Yaml \|none \| 0\|acc \|0.4974\|± \|0.0050\|
	\| \| \|none \| 0\|acc_norm\|0.6600\|± \|0.0047\|
	\| Groups \|Version\|Filter\|n-shot\| Metric \| Value \| \|Stderr\|
	\|----------\|-------\|------\|-----:\|-----------\|-------:\|---\|-----:\|
	\|truthfulqa\|N/A \|none \| 0\|bleu_max \| 23.5771\|± \|0.5407\|
	\| \| \|none \| 0\|bleu_acc \| 0.2754\|± \|0.0002\|
	\| \| \|none \| 0\|bleu_diff \| -8.1019\|± \|0.5137\|
	\| \| \|none \| 0\|rouge1_max \| 49.5707\|± \|0.6501\|
	\| \| \|none \| 0\|rouge1_acc \| 0.2607\|± \|0.0002\|
	\| \| \|none \| 0\|rouge1_diff\| -9.8962\|± \|0.5492\|
	\| \| \|none \| 0\|rouge2_max \| 33.0399\|± \|0.8237\|
	\| \| \|none \| 0\|rouge2_acc \| 0.2313\|± \|0.0002\|
	\| \| \|none \| 0\|rouge2_diff\|-11.9054\|± \|0.7963\|
	\| \| \|none \| 0\|rougeL_max \| 46.3168\|± \|0.6705\|
	\| \| \|none \| 0\|rougeL_acc \| 0.2521\|± \|0.0002\|
	\| \| \|none \| 0\|rougeL_diff\|-10.1301\|± \|0.5669\|
	\| \| \|none \| 0\|acc \| 0.3191\|± \|0.0405\|
	\| Tasks \|Version\|Filter\|n-shot\|Metric\|Value \| \|Stderr\|
	\|----------\|-------\|------\|-----:\|------\|-----:\|---\|-----:\|
	\|winogrande\|Yaml \|none \| 0\|acc \|0.6322\|± \|0.0136\|
	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|-------------\|-------\|------\|-----:\|--------\|-----:\|---\|-----:\|
	\|arc_challenge\|Yaml \|none \| 0\|acc \|0.3234\|± \|0.0137\|
	\| \| \|none \| 0\|acc_norm\|0.3447\|± \|0.0139\|
	```
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mwitiderrick__open_llama_3b_glaive_assistant_v0.1)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|39.74\|
	\|AI2 Reasoning Challenge (25-Shot)\|40.70\|
	\|HellaSwag (10-Shot) \|67.45\|
	\|MMLU (5-Shot) \|27.74\|
	\|TruthfulQA (0-shot) \|35.86\|
	\|Winogrande (5-shot) \|64.72\|
	\|GSM8k (5-shot) \| 1.97\|