| --- |
| library_name: transformers |
| tags: |
| - synthetic |
| license: apache-2.0 |
| datasets: |
| - teknium/OpenHermes-2.5 |
| - Iker/OpenHermes-2.5-Spanish |
| - projecte-aina/RAG_Multilingual |
| - Iker/Document-Translation-en-es |
| - Iker/InstructTranslation-EN-ES |
| - Helsinki-NLP/opus-100 |
| - glaiveai/glaive-code-assistant-v3 |
| - glaiveai/glaive-function-calling-v2 |
| language: |
| - es |
| - en |
| pipeline_tag: text-generation |
| base_model: google/gemma-2b |
| --- |
| |
|
|
|  |
|
|
|
|
| # Neurona 2B Beta: Un Modelo de Lenguage en Español |
|
|
| > Esta es una versión preliminar del dataset card. El modelo está en desarrollo y no es la versión final. Si quieres saber más sobre este modelo, escribe a iker.garciaf@ehu.eus |
|
|
|
|
| Neurona 2B es un modelo de lenguaje en Español. Esta es la primera iteración y un experimento para poner a punto los scripts y la infraestructura. |
|
|
| Neurona 2B ha sido entrenado con los siguiente datasets |
|
|
| - [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) |
| - [Iker/OpenHermes-2.5-Spanish](https://huggingface.co/datasets/Iker/OpenHermes-2.5-Spanish) |
| - [Iker/Document-Translation-en-es](https://huggingface.co/datasets/Iker/Document-Translation-en-es) |
| - [Iker/InstructTranslation-EN-ES](https://huggingface.co/datasets/Iker/InstructTranslation-EN-ES) |
| - [Helsinki-NLP/opus-100 (en-es, only a few examples to reach 1 million instructions)](https://huggingface.co/datasets/Helsinki-NLP/opus-100) |
| - [projecte-aina/RAG_Multilingual(es only, 3701 examples)](https://huggingface.co/datasets/projecte-aina/RAG_Multilingual) |
| - [glaiveai/glaive-code-assistant-v3](https://huggingface.co/datasets/glaiveai/glaive-code-assistant-v3) |
| - [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) |
|
|
| Esta mezcla de datasets en Inglés y Español, permite al modelo adquirir diferentes capacidades, como RAG, function calling, code assistant, question answering, summarization... tanto en Inglés como en Español. |
|
|
| # Entrenamiento |
|
|
| Este modelo se ha entrado usando 4xNvidia A100 80Gb y axolotl |
| [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl) |
|
|
| Esta es la configuración usada |
|
|
| ```yaml |
| base_model: google/gemma-2b |
| model_type: AutoModelForCausalLM |
| tokenizer_type: AutoTokenizer |
| is_falcon_derived_model: |
| is_llama_derived_model: |
| is_qwen_derived_model: |
| is_mistral_derived_model: |
| |
| load_in_8bit: false |
| load_in_4bit: false |
| strict: false |
| |
| device_map: null |
| |
| datasets: |
| - path: /ikerlariak/igarcia945/Mortadelo-Filemon/final_dataset/OpenHermes-2.5-Spanish_fix_gpt.jsonl |
| type: sharegpt |
| conversation: chatml |
| field: conversations |
| roles: |
| input: |
| - system |
| - gpt |
| output: |
| - human |
| - path: /ikerlariak/igarcia945/Mortadelo-Filemon/final_dataset/OpenHermes-2.5-English.jsonl |
| type: sharegpt |
| conversation: chatml |
| field: conversations |
| - path: /ikerlariak/igarcia945/Mortadelo-Filemon/final_dataset/glaive-function-calling-v2.jsonl |
| type: sharegpt |
| conversation: chatml |
| field: conversations |
| roles: |
| input: |
| - system |
| - gpt |
| - tool |
| output: |
| - human |
| - path: /ikerlariak/igarcia945/Mortadelo-Filemon/final_dataset/glaive-code-assistant-v3-small.jsonl |
| type: sharegpt |
| conversation: chatml |
| field: conversations |
| roles: |
| input: |
| - system |
| - gpt |
| output: |
| - human |
| chat_template: chatml |
| |
| dataset_prepared_path: /ikerlariak/igarcia945/Mortadelo-Filemon/gemma-2b-spanish/dataset |
| |
| shuffle_merged_datasets: true |
| |
| val_set_size: 0.005 |
| |
| output_dir: /ikerlariak/igarcia945/Mortadelo-Filemon/gemma-2b-spanish/ |
| |
| adapter: |
| lora_model_dir: |
| |
| sequence_len: 8192 |
| sample_packing: true |
| eval_sample_packing: false |
| pad_to_sequence_len: false |
| |
| special_tokens: |
| bos_token: "<|im_start|>" |
| eos_token: "<|im_end|>" |
| pad_token: "<|end_of_text|>" |
| |
| tokens: |
| - "<|begin_of_text|>" |
| - "<|end_of_text|>" |
| - "<|im_start|>" |
| - "<|im_end|>" |
| - "<|start_header_id|>" |
| - "<|end_header_id|>" |
| - "<tool_call>" |
| - "<tool_response>" |
| - "<tools>" |
| - "</tool_call>" |
| - "</tool_response>" |
| - "</tools>" |
| - "<reserved1>" |
| - "<reserved2>" |
| - "<reserved3>" |
| - "<reserved4>" |
| |
| |
| |
| neftune_noise_alpha: 5 |
| |
| wandb_project: Mortadelo&Filemon |
| wandb_entity: igarciaf |
| wandb_watch: |
| wandb_name: gemma2b |
| wandb_log_model: |
| |
| gradient_accumulation_steps: 32 |
| micro_batch_size: 2 |
| eval_batch_size: 2 |
| num_epochs: 3 |
| optimizer: adamw_torch_fused |
| lr_scheduler: cosine |
| learning_rate: 0.00007 |
| |
| |
| train_on_inputs: false |
| group_by_length: false |
| bf16: true |
| fp16: false |
| tf32: false |
| |
| gradient_checkpointing: true |
| early_stopping_patience: |
| resume_from_checkpoint: |
| local_rank: |
| logging_steps: 1 |
| xformers_attention: |
| flash_attention: true |
| |
| warmup_ratio: 0.03 |
| evals_per_epoch: 4 |
| eval_table_size: |
| save_strategy: "no" |
| debug: |
| deepspeed: /ikerlariak/igarcia945/Mortadelo-Filemon/train_configs/deepspeed_zero3.json |
| weight_decay: 0.0 |
| fsdp: |
| fsdp_config: |
| special_tokens: |
| |
| seed: 33 |
| ``` |
|
|
|
|
|
|